Episode Details
Back to Episodes
Every AI Agent Has an Evaluation Gap | Alex Ratner, Snorkel AI
Description
Alex Ratner co-founded Snorkel AI out of Chris Ré's Stanford lab and helped establish data-centric AI as a field. Today, Snorkel is a $1.3B company shipping thousands of data sets and environments a week to frontier labs and vertical AI teams like Harvey.
In this conversation, he argues our ability to build AI agents has outpaced our ability to measure them. That gap is what's keeping most enterprise agents stuck in demo purgatory.
If you can't measure it, you can't improve it. And you can't deploy it.
In this conversation:
- The three axes of the evaluation gap: input complexity, autonomy horizon, and output complexity
- Big Law Bench: how Snorkel and Harvey benchmarked legal agents on deep-research tasks that take lawyers 10-15 hours
- What Snorkel's $3M Open Benchmarks Grant is funding, and why "benchmaxxing" critiques don't kill the case for public benchmarks
- Why 40-50% of Snorkel's data work is still review and labeling, even with the best models in the loop
- The "expert-agentic" era, where domain expertise (law, finance, coding, even woodworking) is the new bottleneck
- Why self-supervision is a dead end outside narrow cases like distillation
- The false dichotomy between data and environments, and why pure-environment vendors miss how AI actually works
Chapters
(00:00) Intro: Alex Ratner and Snorkel AI
(02:50) What the evaluation gap actually is
(06:05) Moravec's paradox and the jagged frontier
(08:46) Where AI agents fall down in enterprise work
(10:40) Big Law Bench: benchmarking Harvey's legal agents
(12:00) The three axes: input, autonomy horizon, output
(18:31) Snorkel's $3M Open Benchmarks Grant
(22:33) From "janitorial" to epicenter: 15 years of data-centric AI
(29:26) The expert-agentic data era
(34:54) The false dichotomy between data and environments
(40:05) DoorDash Tasks and expert data at scale
Connect with Alex Ratner:
- X/Twitter: https://x.com/ajratner
- Snorkel AI: https://snorkel.ai
Connect with Conor:
- Newsletter: https://newsletter.chainofthought.show/
- Twitter/X: https://x.com/ConorBronsdon
- LinkedIn: https://www.linkedin.com/in/conorbronsdon/
- YouTube: https://www.youtube.com/@ConorBronsdon
More episodes: https://chainofthought.show
Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems