Podcast Episodes

“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub

We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned obj…

1 year, 1 month ago

Short Long

View Episode

“The Most Forbidden Technique” by Zvi

The Most Forbidden Technique is training an AI using interpretability techniques.

An AI produces a final output [X] via some method [M]. You can analy…

1 year, 1 month ago

Short Long

View Episode

“Trojan Sky” by Richard_Ngo

You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them as soon as you wake up every morning. Keep your …

1 year, 1 month ago

Short Long

View Episode

“OpenAI:” by Daniel Kokotajlo

Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's basically the first steps along the research agenda I s…

1 year, 1 month ago

Short Long

View Episode

“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis

LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their prod…

1 year, 1 month ago

Short Long

View Episode

“So how well is Claude playing Pokémon?” by Julian Bradshaw

Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The liv…

1 year, 1 month ago

Short Long

View Episode

“Methods for strong human germline engineering” by TsviBT

Note: an audio narration is not available for this article. Please see the original text.

The original text contained 169 footnotes which were omitt…

1 year, 1 month ago

Short Long

View Episode

“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth

In a recent post, Cole Wyeth makes a bold claim:

. . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done …

1 year, 1 month ago

Short Long

View Episode

“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis

This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading.

I'm not fu…

1 year, 1 month ago

Short Long

View Episode

“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard

This is a critique of How to Make Superbabies on LessWrong.

Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possible. …

1 year, 1 month ago

Short Long

View Episode

Podcast Episodes

“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub

“The Most Forbidden Technique” by Zvi

“Trojan Sky” by Richard_Ngo

“OpenAI:” by Daniel Kokotajlo

“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis

“So how well is Claude playing Pokémon?” by Julian Bradshaw

“Methods for strong human germline engineering” by TsviBT

“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth

“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis

“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard

Love PodBriefly?