Podcast Episodes
Back to Search"Current AIs seem pretty misaligned to me" by ryan_greenblatt
Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're sup…
an hour ago
"Annoyingly Principled People, and what befalls them" by Raemon
Here are two beliefs that are sort of haunting me right now:
Folk who try to push people to uphold principles (whether established ones or novel one…
5 hours ago
"Morale" by J Bostock
One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your m…
19 hours ago
"Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes" by Alex Mallen, ryan_greenblatt
It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at…
21 hours ago
"The policy surrounding Mythos marks an irreversible power shift" by sil
This post assumes Anthropic isn't lying:
Mythos is the current SOTAMythos is potent[1]Anthropic will not make it publicly available un-nerfed[2]Anth…
1 day, 11 hours ago
"Only Law Can Prevent Extinction" by Eliezer Yudkowsky
There's a quote I read as a kid that stuck with me my whole life:
"Remember that all tax revenue is the result of holding a gun to somebody's head. …
1 day, 18 hours ago
"Dario probably doesn’t believe in superintelligence" by RobertM
Epistemic status: I think this is true but don't think this post is a very strong argument for the case, or particularly interesting to read. But I …
1 day, 20 hours ago
"Daycare illnesses" by Nina Panickssery
Before I had a baby I was pretty agnostic about the idea of daycare. I could imagine various pros and cons but I didn’t have a strong overall opinio…
2 days, 7 hours ago
"If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines" by ryan_greenblatt
Anthropic's system card for Mythos Preview says:
It's unclear how we should interpret this. What do they mean by productivity uplift? To what exten…
2 days, 22 hours ago
"Do not be surprised if LessWrong gets hacked" by RobertM
Or, for that matter, anything else.
This post is meant to be two things:
a PSA about LessWrong's current security posture, from a LessWrong admin[1]…
5 days, 21 hours ago