Podcast Episodes
Back to Search
Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)
While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she share…
4 months, 2 weeks ago
Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters
While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of par…
5 months ago
Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy
Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads resea…
5 months ago
Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference
Colab is cozy. But production won’t fit on a single GPU.
Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo…
5 months, 3 weeks ago
Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs
Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s …
6 months ago
Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)
Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, e…
6 months, 1 week ago
Episode 51: Why We Built an MCP Server and What Broke First
What does it take to actually ship LLM-powered features, and what breaks when you connect them to real production data?
In this episode, we hear from …
6 months, 2 weeks ago
Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain
If we want AI systems that actually work, we need to get much better at evaluating them, not just building more pipelines, agents, and frameworks.
In …
6 months, 4 weeks ago
Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)
If we want AI systems that actually work in production, we need better infrastructure—not just better models.
In this episode, Hugo talks with Akshay …
7 months, 1 week ago
Episode 48: How to Benchmark AGI with Greg Kamradt (ARC-AGI)
If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it.
In this episode, Hugo talks with Greg Kamr…
7 months, 3 weeks ago