Episode Details
Back to EpisodesDavid Hand: Ai, Dark Data, LLMs, Peer Review
Episode 158
Published 2 years, 7 months ago
Description
David Hand, a professor of statistics, explains how ChatGPT and other large language models hide behind dark data, leading to misleading outputs. He also critiques the peer‑review process and the perils of partial truths in scientific modeling.
- 00:00:00 - Introduction
- 00:01:34 - What is Dark Data? (missing data matters more than what you have)
- 00:07:03 - The perils of "changing definitions"
- 00:09:15 - David on writing and his selective process
- 00:20:15 - Theory-driven vs. data-driven models (& the constitution of LLMs)
- 00:32:08 - The dilemma of partial truths
- 00:34:40 - The "File Drawer Problem" & its adverse effects on clinical trials
- 00:39:09 - Regression to the mean (how random variations lead to misleading conclusions)
- 00:44:12 - Publication bias
- 00:48:03 - Open-access models and their pitfalls
- 00:54:06 - Why LLMs are simultaneously brilliant & stupid
- 01:03:40 - David’s daily routine
- 01:06:24 - The mean vs. median
- 01:11:07 - Every type of "Dark Data" listed (watch this first!)
SPONSORS:
- Patreon: https://patreon.com/curtjaimungal
- Crypto: https://tinyurl.com/cryptoTOE
- PayPal: https://tinyurl.com/paypalTOE
- Twitter: https://twitter.com/TOEwithCurt
- Discord Invite: https://discord.com/invite/kBcnfNVwqs
- iTunes: https://podcasts.apple.com/ca/podcast...
- Pandora: https://pdora.co/33b9lfP
- Spotify: https://open.spotify.com/show/4gL14b9...
- Subreddit r/TheoriesOfEverything: https://reddit.com/r/theoriesofeveryt...
- TOE Merch: https://tinyurl.com/TOEmerch
RESOURCES:
- YouTube Link: https://www.youtube.com/watch?v=41JBrC5e5tA
- Dark Data: https://amzn.to/446Fou1
- The Improbability Principle: https://amzn.to/3DOn1iX
- 00:00:00 - Introduction
- 00:01:34 - What is Dark Data? (missing data matters more than what you have)
- 00:07:03 - The perils of "changing definitions"
- 00:09:15 - David on writing and his selective process
- 00:20:15 - Theory-driven vs. data-driven models (& the constitution of LLMs)
- 00:32:08 - The dilemma of partial truths
- 00:34:40 - The "File Drawer Problem" & its adverse effects on clinical trials
- 00:39:09 - Regression to the mean (how random variations lead to misleading conclusions)
- 00:44:12 - Publication bias
- 00:48:03 - Open-access models and their pitfalls
- 00:54:06 - Why LLMs are simultaneously brilliant & stupid
- 01:03:40 - David’s daily routine
- 01:06:24 - The mean vs. median
- 01:11:07 - Every type of "Dark Data" listed (watch this first!)
SPONSORS:
- Patreon: https://patreon.com/curtjaimungal
- Crypto: https://tinyurl.com/cryptoTOE
- PayPal: https://tinyurl.com/paypalTOE
- Twitter: https://twitter.com/TOEwithCurt
- Discord Invite: https://discord.com/invite/kBcnfNVwqs
- iTunes: https://podcasts.apple.com/ca/podcast...
- Pandora: https://pdora.co/33b9lfP
- Spotify: https://open.spotify.com/show/4gL14b9...
- Subreddit r/TheoriesOfEverything: https://reddit.com/r/theoriesofeveryt...
- TOE Merch: https://tinyurl.com/TOEmerch
RESOURCES:
- YouTube Link: https://www.youtube.com/watch?v=41JBrC5e5tA
- Dark Data: https://amzn.to/446Fou1
- The Improbability Principle: https://amzn.to/3DOn1iX
Theories of Everything with Curt Jaimungal features long-form, technically detailed interviews with leading researchers in physics, mathematics, consciousness, and philosophy, exploring topics at the level of active research. For academics, graduate students, and anyone seeking depth beyond popular science.
SPONSOR: I subscribe to The Economist for their science and tech coverage. As a TOE listener, get 35% off! No other podcast has this: https://economist.com/TOE
FOLLOW: Substack | Spotify | YouTube | Twitter
Learn more about your ad choices. Visit megaphone.fm/adchoices
Learn more about your ad choices. Visit megaphone.fm/adchoices