Podcast Episodes
Back to SearchWithout fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
A pdf version of this report is available here.
Summary.
In this report we argue that AI systems capable of large scale scientific research will like…
2 years, 2 months ago
Making every researcher seek grants is a broken model
This is a linkpost for https://rootsofprogress.org/the-block-funding-model-for-scienceWhen Galileo wanted to study the heavens through his telescope,…
2 years, 2 months ago
The case for training frontier AIs on Sumerian-only corpus
Let your every day be full of joy, love the child that holds your hand, let your wife delight in your embrace, for these alone are the concerns of hu…
2 years, 2 months ago
This might be the last AI Safety Camp
We are organising the 9th edition without funds. We have no personal runway left to do this again. We will not run the 10th edition without funding. …
2 years, 2 months ago
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Crossposted from substack.
As we all know, sugar is sweet and s…
2 years, 2 months ago
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:
https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-u…
2 years, 2 months ago
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
This is a linkpost for https://arxiv.org/abs/2401.05566
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:…
2 years, 2 months ago
The impossible problem of due process
I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons:
because I …
2 years, 3 months ago
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"
This is the first essay in a series that I’…
2 years, 3 months ago
Introducing Alignment Stress-Testing at Anthropic
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training D…
2 years, 3 months ago