Podcast Episodes
Back to Search“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Part 13 of 12 in the Engineer's Interpretability Sequence.
TL;D…
1 year, 9 months ago
“What’s Going on With OpenAI’s Messaging?” by ozziegoen
This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion.
Some argum…
1 year, 9 months ago
“Language Models Model Us” by eggsyntax
Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow
One-sentence summary: On a dataset of human-written essa…
1 year, 9 months ago
Jaan Tallinn’s 2023 Philanthropy Overview
This is a link post.to follow up my philantropic pledge from 2020, i've updated my philanthropy page with 2023 results.
in 2023 my donations funded $4…
1 year, 9 months ago
“OpenAI: Exodus” by Zvi
Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board…
1 year, 9 months ago
DeepMind’s ”Frontier Safety Framework” is weak and unambitious
FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP.…
1 year, 9 months ago
Do you believe in hundred dollar bills lying on the ground? Consider humming
Introduction.
[Reminder: I am an internet weirdo with no medical credentials]
A few months ago, I published some crude estimates of the power of nitr…
1 year, 9 months ago
Deep Honesty
Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are …
1 year, 9 months ago
On Not Pulling The Ladder Up Behind You
Epistemic Status: Musing and speculation, but I think there's a real thing here.
1.
When I was a kid, a friend of mine had a tree fort. If you've neve…
1 year, 9 months ago
Mechanistically Eliciting Latent Behaviors in Language Models
Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).
TL,DR: I introduce a method for eliciting latent be…
1 year, 10 months ago