Podcast Episodes
Back to Search“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned obj…
1 year, 1 month ago
“The Most Forbidden Technique” by Zvi
The Most Forbidden Technique is training an AI using interpretability techniques.
An AI produces a final output [X] via some method [M]. You can analy…
1 year, 1 month ago
“Trojan Sky” by Richard_Ngo
You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them as soon as you wake up every morning. Keep your …
1 year, 1 month ago
“OpenAI:” by Daniel Kokotajlo
Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's basically the first steps along the research agenda I s…
1 year, 1 month ago
“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their prod…
1 year, 1 month ago
“So how well is Claude playing Pokémon?” by Julian Bradshaw
Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The liv…
1 year, 1 month ago
“Methods for strong human germline engineering” by TsviBT
Note: an audio narration is not available for this article. Please see the original text.
The original text contained 169 footnotes which were omitt…
1 year, 1 month ago
“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth
In a recent post, Cole Wyeth makes a bold claim:
. . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done …
1 year, 1 month ago
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis
This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading.
I'm not fu…
1 year, 1 month ago
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard
This is a critique of How to Make Superbabies on LessWrong.
Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possible. …
1 year, 1 month ago