Podcast Episodes

Back to Search
“” by null

Error rendering URL ---

Source:
https://www.lesswrong.com/posts/HKCKinBgsKKvjQyWK/read-the-pricing-first

---

Narrated by…

10 months ago

Short Long
View Episode
“A quick list of reward hacking interventions” by Alex Mallen

This is a quick list of interventions that might help fix issues from reward hacking.

(We’re referring to the general definition of reward hacking: …

10 months ago

Short Long
View Episode
“Ghiblification for Privacy” by jefftk

I often want to include an image in my posts to give a sense of asituation. A photo communicates the most, but sometimes that's toomuch: some partic…

10 months ago

Short Long
View Episode
“Broad-Spectrum Cancer Treatments” by sarahconstantin

Midjourney, “engraving of Apollo shooting his bow at a distant cancer cell” Introduction and Principles

The conventional wisdom is that we can’t “cur…

10 months ago

Short Long
View Episode
“When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.” by ryan_greenblatt

Recently, Anthropic released Opus 4 and said they couldn't rule out the model triggering ASL-3 safeguards due to the model's CBRN capabilities. That…

10 months, 1 week ago

Short Long
View Episode
“Outer Alignment is the Necessary Compliment to AI 2027’s Best Case Scenario” by Josh Hickman

To the extent we believe more advanced training and control techniques will lead to alignment of agents capable enough to strategically make success…

10 months, 1 week ago

Short Long
View Episode
“Dwarkesh Patel on Continual Learning” by Zvi

A key question going forward is the extent to which making further AI progress will depend upon some form of continual learning. Dwarkesh Patel offe…

10 months, 1 week ago

Short Long
View Episode
“Personal Agents: AIs as trusted advisors, caretakers, and user proxies” by JWJohnston

Just posted the following on Medium. Interested in comments from readers here, especially pointers to similar efforts and ideas I didn't mention bel…

10 months, 1 week ago

Short Long
View Episode
[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo

This is a link post. METR just made a lovely post detailing many examples they've found of reward hacks by frontier models. Unlike the reward hacks o…

10 months, 1 week ago

Short Long
View Episode
[Linkpost] “Identifying ‘Deception Vectors’ In Models” by Stephen Martin

This is a link post. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting ”…

10 months, 1 week ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us