Podcast Episodes
Back to Search“So how well is Claude playing Pokémon?” by Julian Bradshaw
Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The liv…
11 months, 3 weeks ago
“Methods for strong human germline engineering” by TsviBT
Note: an audio narration is not available for this article. Please see the original text.
The original text contained 169 footnotes which were omitt…
11 months, 3 weeks ago
“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth
In a recent post, Cole Wyeth makes a bold claim:
. . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done …
11 months, 3 weeks ago
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis
This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading.
I'm not fu…
11 months, 3 weeks ago
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard
This is a critique of How to Make Superbabies on LessWrong.
Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possible. …
11 months, 3 weeks ago
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout
This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Ev…
11 months, 3 weeks ago
“Judgements: Merging Prediction & Evidence” by abramdemski
I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the id…
1 year ago
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis
First, let me quote my previous ancient post on the topic:
Effective Strategies for Changing Public Opinion
The titular paper is very relevant here. I'…
1 year ago
“Power Lies Trembling: a three-book review” by Richard_Ngo
In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme force…
1 year ago
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable co…
1 year ago