Podcast Episodes
Back to SearchIroning Out the Squiggles
Adversarial Examples: A Problem
The apparent successes of the deep learning revolution conceal a dark underbelly. It may seem that we now know how to…
1 year, 10 months ago
Introducing AI Lab Watch
This is a linkpost for https://ailabwatch.orgI'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluate…
1 year, 10 months ago
Refusal in LLMs is mediated by a single direction
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in the ML…
1 year, 10 months ago
Funny Anecdote of Eliezer From His Sister
This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Shev…
1 year, 10 months ago
Thoughts on seed oil
This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was…
1 year, 10 months ago
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
Yesterday Adam Shai put up a cool post which… well, take a look at the visual:
Yup, it sure looks like that fractal is very noisily embedded in the re…
1 year, 10 months ago
Express interest in an “FHI of the West”
TLDR: I am investigating whether to found a spiritual successor to FHI, housed under Lightcone Infrastructure, providing a rich cultural environment …
1 year, 10 months ago
Transformers Represent Belief State Geometry in their Residual Stream
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS.…
1 year, 10 months ago
Paul Christiano named as US AI Safety Institute Head of AI Safety
This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safetyU.S. …
1 year, 10 months ago
[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
This is a linkpost for https://twitter.com/ESYudkowsky/status/…
1 year, 10 months ago