Podcast Episodes
Back to Search“Does Hebrew Have Verbs?” by Benquo
Spinoza's Compendium of Hebrew Grammar (1677, posthumous, unfinished) contains a claim that scholars have been misreading for centuries. He says tha…
1 month ago
“Untrusted Monitoring is Default; Trusted Monitoring is not” by J Bostock
These views are my own and not necessarily representative of those of any colleagues with whom I have worked on AI control.
TL;DR: It's much cheaper…
1 month ago
“China Derangement Syndrome” by Arjun Panickssery
Often I see people claim it's essential for America to win the AI race against China (in whatever sense) for reasons like these:
“What is the reaso…1 month ago
“Contrastive features elicit different perturbation responses than SAE features” by Francisco Ferreira da Silva, StefanHex
Note: This is a research update sharing preliminary results as part of ongoing work.
Figure 1: Contrastive (difference-of-means, English→Mandarin) f…
1 month ago
“Confusion around the term reward hacking” by ariana_azarbal
Summary: "Reward hacking" commonly refers to two different phenomena: misspecified-reward exploitation, where RL reinforces undesired behaviors that…
1 month ago
“A List of Research Directions in Character Training” by Rauno Arike
Thanks to Rohan Subramani, Ariana Azarbal, and Shubhorup Biswas for proposing some of the ideas and helping develop them during a sprint. Thanks to …
1 month ago
“The Distaff Texts” by Tomás B.
Though I spend most of my time studying what is labelled “history” in some manuscripts and “malignant lies” in others and the “siren scrawls of that…
1 month ago
“Intention vs. Trying: Separate Prediction from Goal-Seeking” by plex
tl;dr: Mixing goal-directedness into cognitive processes that are working to truth-seek about possible futures tends to undermine both truth-seeking…
1 month ago
“On restraining AI development for the sake of safety” by Joe Carlsmith
(Podcast version, read by the author, here, or search for "Joe Carlsmith Audio" on your podcast app.
This is the tenth essay in a series I’m calling…
1 month ago
“Nullius in Verba” by Aurelia
Independent verification by the Brain Preservation Foundation and the Survival and Flourishing Fund — the results so far
Cultivating independent ver…
1 month ago