Podcast Episodes
Back to SearchDo you believe in hundred dollar bills lying on the ground? Consider humming
Introduction.
[Reminder: I am an internet weirdo with no medical credentials]
A few months ago, I published some crude estimates of the power of nitr…
1 year, 10 months ago
Deep Honesty
Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are …
1 year, 11 months ago
On Not Pulling The Ladder Up Behind You
Epistemic Status: Musing and speculation, but I think there's a real thing here.
1.
When I was a kid, a friend of mine had a tree fort. If you've neve…
1 year, 11 months ago
Mechanistically Eliciting Latent Behaviors in Language Models
Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).
TL,DR: I introduce a method for eliciting latent be…
1 year, 11 months ago
Ironing Out the Squiggles
Adversarial Examples: A Problem
The apparent successes of the deep learning revolution conceal a dark underbelly. It may seem that we now know how to…
1 year, 11 months ago
Introducing AI Lab Watch
This is a linkpost for https://ailabwatch.orgI'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluate…
1 year, 11 months ago
Refusal in LLMs is mediated by a single direction
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in the ML…
1 year, 11 months ago
Funny Anecdote of Eliezer From His Sister
This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Shev…
1 year, 11 months ago
Thoughts on seed oil
This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was…
1 year, 11 months ago
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
Yesterday Adam Shai put up a cool post which… well, take a look at the visual:
Yup, it sure looks like that fractal is very noisily embedded in the re…
1 year, 11 months ago