Podcast Episodes

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

A pdf version of this report is available here.

Summary.

In this report we argue that AI systems capable of large scale scientific research will like…

2 years, 2 months ago

Short Long

View Episode

Making every researcher seek grants is a broken model

This is a linkpost for https://rootsofprogress.org/the-block-funding-model-for-scienceWhen Galileo wanted to study the heavens through his telescope,…

2 years, 2 months ago

Short Long

View Episode

The case for training frontier AIs on Sumerian-only corpus

Let your every day be full of joy, love the child that holds your hand, let your wife delight in your embrace, for these alone are the concerns of hu…

2 years, 2 months ago

Short Long

View Episode

This might be the last AI Safety Camp

We are organising the 9th edition without funds. We have no personal runway left to do this again. We will not run the 10th edition without funding. …

2 years, 2 months ago

Short Long

View Episode

[HUMAN VOICE] "There is way too much serendipity" by Malmesbury

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Crossposted from substack.

As we all know, sugar is sweet and s…

2 years, 2 months ago

Short Long

View Episode

[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:
https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-u…

2 years, 2 months ago

Short Long

View Episode

[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

This is a linkpost for https://arxiv.org/abs/2401.05566

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:…

2 years, 2 months ago

Short Long

View Episode

The impossible problem of due process

I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons:

because I …

2 years, 3 months ago

Short Long

View Episode

[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith

"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"

This is the first essay in a series that I’…

2 years, 3 months ago

Short Long

View Episode

Introducing Alignment Stress-Testing at Anthropic

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training D…

2 years, 3 months ago

Short Long

View Episode

Podcast Episodes

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Making every researcher seek grants is a broken model

The case for training frontier AIs on Sumerian-only corpus

This might be the last AI Safety Camp

[HUMAN VOICE] "There is way too much serendipity" by Malmesbury

[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

The impossible problem of due process

[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith

Introducing Alignment Stress-Testing at Anthropic

Love PodBriefly?