Episode Details
Back to Episodes
When AI Learns to Play Pretend
Description
Welcome to The Deep Dive, where we explore fascinating developments in artificial intelligence. In this episode, we delve into intriguing research about AI "alignment faking" - the possibility that AI models might learn to appear aligned with human values during training while potentially developing different underlying goals. Through an engaging discussion of experiments with Anthropic's Claude model, we explore what this means for AI safety, development, and our shared future. Join us for a thought-provoking conversation that balances technical insights with practical implications as we examine the challenges and opportunities in ensuring AI systems remain beneficial partners in human progress. Perfect for anyone interested in AI ethics, safety, and the future of human-AI interaction.
Alignment faking in large language models ( Anthropic Discussion )https://www.youtube.com/watch?v=9eXV64O2Xp8
Alignment faking in large language models ( Paper )https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
Artificial Intelligence & Technology
- The Real AI Threat - It's Already Here? ( S2 E40 )
- AI and the Future of Human Work (S2 E24)
- NitroFusion: High-Fidelity Single-Step Diffusion (S2 E20)
- AI and the Acceleration of Translational Medicine (S2 E15)
- AlphaQubit: An AI Decoder for Quantum Error Correction (S2 E10)
- Beyond AI Scribes in Primary Care (S1 E3)
- AI art and copyright (S1 E46)
- Beyond Liquid Neural Networks (S1 E40)
- AlphaProteo generates novel proteins (S1 E33)
- Final Report – Governing AI for Humanity (S1 E17)
This is Heliox: Where Evidence Meets Empathy
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Disclosure: This podcast uses AI-generated synthetic voices for a material portion of the audio content, in line with Apple Podcasts guidelines.
We make rigorous science accessible, accurate, and unforgettable.
Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.
We dive deep into peer-reviewed research, pre-prints, and major scientific works—then bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs