Podcast Episodes
Back to Search“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout
This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Ev…
1 year, 1 month ago
“Judgements: Merging Prediction & Evidence” by abramdemski
I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the id…
1 year, 1 month ago
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis
First, let me quote my previous ancient post on the topic:
Effective Strategies for Changing Public Opinion
The titular paper is very relevant here. I'…
1 year, 1 month ago
“Power Lies Trembling: a three-book review” by Richard_Ngo
In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme force…
1 year, 1 month ago
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable co…
1 year, 1 month ago
“The Paris AI Anti-Safety Summit” by Zvi
It doesn’t look good.
What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI …
1 year, 1 month ago
“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby
Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.
Circa 2015-2017, a lot of high quality conten…
1 year, 1 month ago
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby
Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a subst…
1 year, 1 month ago
“How to Make Superbabies” by GeneSmith, kman
We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affec…
1 year, 1 month ago
“A computational no-coincidence principle” by Eric Neyman
Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in t…
1 year, 1 month ago