Episode Details

Inference engineering and the real-world deployment of LLMs, with Philip Kiely

Published 4 days, 12 hours ago

Description

Podcast: Complex Systems with Patrick McKenzie (patio11) (LS 45 · TOP 1% what is this?)
Episode: Inference engineering and the real-world deployment of LLMs, with Philip Kiely
Pub date: 2026-03-12

Get Podcast Transcript →
powered by Listen411 - fast audio-to-text and summarization

Patrick McKenzie (patio11) and Philip Kiely, early employee at Baseten, discuss the inference stack: the critical layer of software and hardware that sits between a model’s weights and a user’s prompt. They cover inference engineering, how intermediate layers are evolving over a technical stack that is changing every six months, and how sophisticated organizations are actually consuming LLMs beyond just writing their questions into chatbot apps.
–

Full transcript available here: www.complexsystemspodcast.com/inference-engineering-with-philip-kiely/

–
Presenting Sponsors: Mercury, Meter, & Granola

Complex Systems is presented by Mercury—radically better banking for founders. Mercury offers the best wire experience anywhere: fast, reliable, and free for domestic U.S. wires, so you can stay focused on growing your business. Apply online in minutes at mercury.com.

Networking infrastructure has a way of accumulating technical debt faster than almost anything else in IT. Meter handles the full stack (wired, wireless, and cellular) as a single integrated solution: designed, deployed, and managed end-to-end so there's only one vendor to call when something goes wrong. Visit meter.com/complexsystems to book a demo.

If meetings consistently leave you with hazy action items and lost context, Granola handles the transcription so you can actually participate and gives you searchable notes afterward. Try it free at granola.ai/complexsystems with code COMPLEXSYSTEMS
–

Links:

Download Inference Engineering: https://www.baseten.com/inference-engineering/
Philip's website: https://philipkiely.com/
Stripe's Emily Sands on Complex Systems: https://www.complexsystemspodcast.com/episodes/the-past-present-and-future-of-ai-with-stripe/
Des Traynor on Complex Systems: https://www.complexsystemspodcast.com/episodes/des-traynor/

–

Timestamps:
(00:00) Intro
(00:30) The AI deployment pipeline
(03:04) Evolution of abstraction layers in engineering
(05:14) Defining inference and model weights
(08:45) Architecture of langu

Episode Details

Inference engineering and the real-world deployment of LLMs, with Philip Kiely

Description

Listen Now

Love PodBriefly?