Podcast Episodes

Back to Search
“SAE feature geometry is outside the superposition hypothesis” by jake_mendel

Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain cr…

1 year, 8 months ago

Short Long
View Episode
“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.TL;DR: We published a new paper on out-of-c…

1 year, 8 months ago

Short Long
View Episode
“Boycott OpenAI” by PeterMcCluskey

This is a link post.I have canceled my OpenAI subscription in protest over OpenAI's lack ofethics.

In particular, I object to:

threats to confiscate d…

1 year, 8 months ago

Short Long
View Episode
“Sycophancy to subterfuge: Investigating reward tampering in large language models” by evhub, Carson Denison

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.New Anthropic model organisms research pape…

1 year, 8 months ago

Short Long
View Episode
“I would have shit in that alley, too” by Declan Molony

After living in a suburb for most of my life, when I moved to a major U.S. city the first thing I noticed was the feces. At first I assumed it was do…

1 year, 8 months ago

Short Long
View Episode
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by ryan_greenblatt

ARC-AGI post

Getting 50% (SoTA) on ARC-AGI with GPT-4o

I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate …

1 year, 8 months ago

Short Long
View Episode
“Why I don’t believe in the placebo effect” by transhumanist_atom_understander

Have you heard this before? In clinical trials, medicines have to be compared to a placebo to separate the effect of the medicine from the psychologi…

1 year, 8 months ago

Short Long
View Episode
“Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.As an AI researcher who wants to do technical work that helps h…

1 year, 8 months ago

Short Long
View Episode
“My AI Model Delta Compared To Christiano” by johnswentworth

Preamble: Delta vs Crux

This section is redundant if you already read My AI Model Delta Compared To Yudkowsky.

I don’t natively think in terms of crux…

1 year, 8 months ago

Short Long
View Episode
“My AI Model Delta Compared To Yudkowsky” by johnswentworth

Preamble: Delta vs Crux

I don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll call a delt…

1 year, 8 months ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us