Podcast Episodes
Back to Search[Linkpost] “I’m starting a substack” by leogao
This is a link post.
---
First published:
March 17th, 2026
Source:
https://www.lesswrong.com/posts/Sw…
1 month ago
″“Act-based approval-directed agents”, for IDA skeptics” by Steven Byrnes
Summary / tl;dr
In the 2010s, Paul Christiano built an extensive body of work on AI alignment—see the “Iterated Amplification” series for a curated …
1 month ago
“Consciousness Cluster: Preferences of Models that Claim they are Conscious” by James Chua, Owain_Evans, Sam Marks, Jan Betley
TLDR;
GPT-4.1 denies being conscious or having feelings.
We train it to sayi t's conscious to see waht happens.
Result: It acquires new preference…
1 month ago
“LessOnline ticket sales are live! (Earlybird pricing until April 7)” by Ruby, Ronny Fernandez, Ben Pace
LessOnline is back in 2026, its third year running. As usual, it will take place at Lighthaven in Berkeley, CA. Tickets are live at less.online
When…
1 month ago
[Linkpost] “The Psychopathy Spectrum” by Dawn Drescher
This is a link post.
The term “psychopathy” is a mess, so I've written a sequence to tease apart all the different meanings along several dimensions.…
1 month ago
“Sycophancy Towards Researchers Drives Performative Misalignment” by Taywon Min, rustem17, David Vella Zarb
This work was done by Rustem Turtayev, David Vella Zarb, and Taywon Min during MATS 9.0, mentored by Shi Feng, based on prior work by David Baek. We…
1 month ago
“Extracting Performant Algorithms Using Mechanistic Interpretability” by Ihor Kendiukhov
A Prequel: The Tree of Life Inside a DNA Language Model
Last year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, qui…
1 month ago
“Requiem for a Transhuman Timeline” by Ihor Kendiukhov
The world was fair, the mountains tall,
In Elder Days before the fall
Of mighty kings in Nargothrond
And Gondolin, who now beyond
The Western Seas h…
1 month ago
“Adding Typos Made Haiku’s Accuracy Go Up” by bira
We are curious if large language models behave consistently when user prompts contain typos. To explore this, we ran a small experiment injecting ty…
1 month ago
“LLMs as Giant Lookup-Tables of Shallow Circuits” by niplav, Claude+
Early 2026 LLMs in scaffolds, from simple ones such as giving the model access to a scratchpad/"chain of thought" up to MCP servers, skills, and con…
1 month ago