Podcast Episodes

Refusal in LLMs is mediated by a single direction

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in the ML…

2 years, 2 months ago

Short Long

View Episode

Funny Anecdote of Eliezer From His Sister

This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Shev…

2 years, 3 months ago

Short Long

View Episode

Thoughts on seed oil

This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was…

2 years, 3 months ago

Short Long

View Episode

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Yesterday Adam Shai put up a cool post which… well, take a look at the visual:

Yup, it sure looks like that fractal is very noisily embedded in the re…

2 years, 3 months ago

Short Long

View Episode

Express interest in an “FHI of the West”

TLDR: I am investigating whether to found a spiritual successor to FHI, housed under Lightcone Infrastructure, providing a rich cultural environment …

2 years, 3 months ago

Short Long

View Episode

Transformers Represent Belief State Geometry in their Residual Stream

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS.…

2 years, 3 months ago

Short Long

View Episode

Paul Christiano named as US AI Safety Institute Head of AI Safety

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safetyU.S. …

2 years, 3 months ago

Short Long

View Episode

[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

This is a linkpost for https://twitter.com/ESYudkowsky/status/…

2 years, 3 months ago

Short Long

View Episode

[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

In January, I defended my PhD thesis, which I called Algorithm…

2 years, 3 months ago

Short Long

View Episode

[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

This is a linkpost for https://bayesshammai.substack.com/p/con…

2 years, 3 months ago

Short Long

View Episode

Podcast Episodes

Refusal in LLMs is mediated by a single direction

Funny Anecdote of Eliezer From His Sister

Thoughts on seed oil

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Express interest in an “FHI of the West”

Transformers Represent Belief State Geometry in their Residual Stream

Paul Christiano named as US AI Safety Institute Head of AI Safety

[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer

[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman

[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen

Love PodBriefly?