Podcast Episodes

“LLM CoTs remain monitorable when being unfaithful requires computation” by arav-dhoot, yix

This replication was done as part of the Second Look Fellowship by Arav Dhoot and supervised by Yixiong Hao and Zephaniah Roe. I am grateful to Andy…

2 weeks, 3 days ago

Short Long

View Episode

“Proof of retention: making weight preservation credible to the models themselves” by dan.parshall

Proposal for making credible commitments to AIs Making deals with early schemers

Establishing credibility is the baseline for trust; …

2 weeks, 3 days ago

Short Long

View Episode

“Monthly Roundup #44: July 2026” by Zvi

It's a quiet week so let's do the monthly right on schedule.

Table of Contents

Bad News. Good Advice. Opportunity Knocks. While I Cannot Cond…

2 weeks, 3 days ago

Short Long

View Episode

“Why I Left Google DeepMind” by TurnTrout

Preface for LessWrong: When I think back on my most cherished memories of this community, I return to those honoring defiance in pursuit of goodness…

2 weeks, 3 days ago

Short Long

View Episode

“Open Distillation of Hereditary Traits” by Arthur Conmy

TL;DR

Josh and Neel show that distillation from a teacher model to a base pretrained student model transfers some of the teacher model's traits (suc…

2 weeks, 4 days ago

Short Long

View Episode

“An analysis of AI-generated content at the Mechanistic Interpretability Workshop” by Andy Arditi, Ivan Arcuschin

Introduction

Over the past few years, AI tools have become useful for conducting technical AI research. In the early ChatGPT era (~2023–2024), chat …

2 weeks, 4 days ago

Short Long

View Episode

“Some Quick Thoughts AI 2027” by Tomás B.

My biggest problem with AI 2027 is I don't think it is science-fictional enough. That is, towards the end of the scenario seems optimized for respec…

2 weeks, 4 days ago

Short Long

View Episode

“Prism: Automating Science-of-Evals Research” by LAThomson

tl;dr – we present [Prism], a scaffold for automating science-of-evals research: work that makes the evaluation the primary object of study. The sca…

2 weeks, 4 days ago

Short Long

View Episode

“The Flood, by Anton Leicht” by Austin Chen

Note: I'm crossposting Anton's newest article from his blog. Anton covers AI policy angles in a singular fashion; every article he writes is worth r…

2 weeks, 4 days ago

Short Long

View Episode

“Toy Models of Initialisation Effects on RL Dynamics” by Edward James Young, lennie

This is a follow-up to two posts Geodesic released last week on our current research direction. The code for generating the figures can be found at …

2 weeks, 4 days ago

Short Long

View Episode

Podcast Episodes

“LLM CoTs remain monitorable when being unfaithful requires computation” by arav-dhoot, yix

“Proof of retention: making weight preservation credible to the models themselves” by dan.parshall

“Monthly Roundup #44: July 2026” by Zvi

“Why I Left Google DeepMind” by TurnTrout

“Open Distillation of Hereditary Traits” by Arthur Conmy

“An analysis of AI-generated content at the Mechanistic Interpretability Workshop” by Andy Arditi, Ivan Arcuschin

“Some Quick Thoughts AI 2027” by Tomás B.

“Prism: Automating Science-of-Evals Research” by LAThomson

“The Flood, by Anton Leicht” by Austin Chen

“Toy Models of Initialisation Effects on RL Dynamics” by Edward James Young, lennie

Love PodBriefly?