Episode Details

When AI Learns to Play Pretend

Season 2 Episode 42 Published 1 year, 2 months ago

Description

Welcome to The Deep Dive, where we explore fascinating developments in artificial intelligence. In this episode, we delve into intriguing research about AI "alignment faking" - the possibility that AI models might learn to appear aligned with human values during training while potentially developing different underlying goals. Through an engaging discussion of experiments with Anthropic's Claude model, we explore what this means for AI safety, development, and our shared future. Join us for a thought-provoking conversation that balances technical insights with practical implications as we examine the challenges and opportunities in ensuring AI systems remain beneficial partners in human progress. Perfect for anyone interested in AI ethics, safety, and the future of human-AI interaction.

Alignment faking in large language models ( Anthropic Discussion )https://www.youtube.com/watch?v=9eXV64O2Xp8

Alignment faking in large language models ( Paper )https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf

Artificial Intelligence & Technology

The Real AI Threat - It's Already Here? ( S2 E40 )
AI and the Future of Human Work (S2 E24)
NitroFusion: High-Fidelity Single-Step Diffusion (S2 E20)
AI and the Acceleration of Translational Medicine (S2 E15)
AlphaQubit: An AI Decoder for Quantum Error Correction (S2 E10)
Beyond AI Scribes in Primary Care (S1 E3)
AI art and copyright (S1 E46)
Beyond Liquid Neural Networks (S1 E40)
AlphaProteo generates novel proteins (S1 E33)
Final Report – Governing AI for Humanity (S1 E17)

This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.

Support the show

Disclosure: This podcast uses AI-generated synthetic voices for a material portion of the audio content, in line with Apple Podcasts guidelines.

We make rigorous science accessible, accurate, and unforgettable.

Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.

We dive deep into peer-reviewed research, pre-prints, and major scientific works—then bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.

Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs

Episode Details

When AI Learns to Play Pretend

Description

Listen Now

Love PodBriefly?