Podcast Episodes

Back to Search
“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

Abstract

We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignm…

4 months, 3 weeks ago

Short Long
View Episode
“Anthropic is (probably) not meeting its RSP security commitments” by habryka

TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of ambiguity,…

4 months, 3 weeks ago

Short Long
View Episode
“Varieties Of Doom” by jdp

There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" didn't feel like it ma…

4 months, 3 weeks ago

Short Long
View Episode
“How Colds Spread” by RobertM

It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number of s…

4 months, 4 weeks ago

Short Long
View Episode
“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards …

4 months, 4 weeks ago

Short Long
View Episode
“Where is the Capital? An Overview” by johnswentworth

When a new dollar goes into the capital markets, after being bundled and securitized and lent several times over, where does it end up? When society…

4 months, 4 weeks ago

Short Long
View Episode
“Problems I’ve Tried to Legibilize” by Wei Dai

Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk …

4 months, 4 weeks ago

Short Long
View Episode
“Do not hand off what you cannot pick up” by habryka

Delegation is good! Delegation is the foundation of civilization! But in the depths of delegation madness breeds and evil rises.

In my experience, …

4 months, 4 weeks ago

Short Long
View Episode
“7 Vicious Vices of Rationalists” by Ben Pace

Vices aren't behaviors that one should never do. Rather, vices are behaviors that are fine and pleasurable to do in moderation, but tempting to do i…

4 months, 4 weeks ago

Short Long
View Episode
“Tell people as early as possible it’s not going to work out” by habryka

Context: Post #4 in my sequence of private Lightcone Infrastructure memos edited for public consumption

This week's principle is more about how I wa…

4 months, 4 weeks ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us