Podcast Episodes
Back to Search“Announcing the Center for Shared AI Prosperity” by Dylan Matthews
I wanted to share the launch of a project I've been working on with pollster David Shor, Obama/Biden veteran Stef Feldman, political strategist Morr…
3 weeks, 1 day ago
“Risk reports need to address deployment-time spread of misalignment” by Alex Mallen
Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genu…
3 weeks, 1 day ago
“Mechanistic estimation for expectations of random products” by Jacob_Hilton
We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as …
3 weeks, 1 day ago
“MATS 9 Retrospective & Advice” by beyarkay
I couldn’t find a recent write-up from a MATS alum about what attending MATS was like, so this is the thing that I wish I had. I attended MATS from …
3 weeks, 1 day ago
[Linkpost] “Don’t be too Clever to Take Obvious Advice” by Hide
This is a link post.
An insidious pattern among smart people is feeling that because something is familiar and obvious, you are impervious to ignorin…
3 weeks, 1 day ago
“Verification-Centric AI” by Raemon
"Sometimes the AI just makes stuff up" is a problem I don't really expect to go away. In the nearterm, AI is going to keep occasionally hallucinatin…
3 weeks, 1 day ago
“Convergent Abstraction Hypothesis” by Jan_Kulveit
Tl;dr
Convergent abstraction hypothesis posits abstractions are often convergent in the sense of convergent evolution: different cognitive systems c…
3 weeks, 1 day ago
“AI #168: Not Leading the Future” by Zvi
This is what a lull looks like at this point. The government is having internal arguments. The models are getting improved internally. The coding ag…
3 weeks, 2 days ago
“Automated Alignment is Harder Than You Think” by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving
Summary
This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here.
AI research agents may help solve ASI ali…
3 weeks, 2 days ago
“The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness” by Charlie Griffin, Patrick Leask
1) The safe-to-dangerous shift is a fundamental problem for eval realism
Suppose we have a capable and potentially scheming model, and before we dep…
3 weeks, 2 days ago