Podcast Episodes

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she share…

4 months, 2 weeks ago

Short Long

View Episode

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of par…

5 months ago

Short Long

View Episode

Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads resea…

5 months ago

Short Long

View Episode

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Colab is cozy. But production won’t fit on a single GPU.
Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo…

5 months, 3 weeks ago

Short Long

View Episode

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s …

6 months ago

Short Long

View Episode

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, e…

6 months, 1 week ago

Short Long

View Episode

Episode 51: Why We Built an MCP Server and What Broke First

What does it take to actually ship LLM-powered features, and what breaks when you connect them to real production data?
In this episode, we hear from …

6 months, 2 weeks ago

Short Long

View Episode

Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

If we want AI systems that actually work, we need to get much better at evaluating them, not just building more pipelines, agents, and frameworks.
In …

6 months, 4 weeks ago

Short Long

View Episode

Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

If we want AI systems that actually work in production, we need better infrastructure—not just better models.
In this episode, Hugo talks with Akshay …

7 months, 1 week ago

Short Long

View Episode

Episode 48: How to Benchmark AGI with Greg Kamradt (ARC-AGI)

If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it.
In this episode, Hugo talks with Greg Kamr…

7 months, 3 weeks ago

Short Long

View Episode

Podcast Episodes

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

Episode 51: Why We Built an MCP Server and What Broke First

Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

Episode 48: How to Benchmark AGI with Greg Kamradt (ARC-AGI)

Love PodBriefly?