Podcast Episodes
Back to Search"Sum-threshold attacks" by TsviBT
How do you affect something far away, a lot, without anyone noticing?
(Note: you can safely skip sections. It is also safe to skip the essay entirely,…
2 years, 7 months ago
"A list of core AI safety problems and how I hope to solve them" by Davidad
Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try to …
2 years, 7 months ago
"Report on Frontier Model Training" by Yafah Edelman
This is a linkpost for https://docs.google.com/document/d/1TsYkDYtV6BKiCN9PAOirRAy3TrNDu2XncUZ5UZfaAKA/edit?usp=sharing
Understanding what drives the …
2 years, 7 months ago
"Defunding My Mistake" by ymeskhout
Until about five years ago, I unironically parroted the slogan All Cops Are Bastards (ACAB) and earnestly advocated to abolish the police and prison …
2 years, 7 months ago
"Sharing Information About Nonlinear" by Ben Pace
Added (11th Sept): Nonlinear have commented that they intend to write a response, have written a short follow-up, and claim that they dispute 85 clai…
2 years, 7 months ago
"One Minute Every Moment" by abramdemski
About how much information are we keeping in working memory at a given moment?
"Miller's Law" dictates that the number of things humans can hold in wo…
2 years, 7 months ago
"What I would do if I wasn’t at ARC Evals" by LawrenceC
In which: I list 9 projects that I would work on if I wasn’t busy working on safety standards at ARC Evals, and explain why they might be good to wor…
2 years, 7 months ago
"The U.S. is becoming less stable" by lc
We focus so much on arguing over who is at fault in this country that I think sometimes we fail to alert on what's actually happening. I would just l…
2 years, 7 months ago
"Meta Questions about Metaphilosophy" by Wei Dai
To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age t…
2 years, 7 months ago
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the ten…
2 years, 7 months ago