Episode Details
Back to Episodes
Nvidia Blackwell architecture & Azure data fabric performance: how to fix GPU I/O bottlenecks
Season 1
Published 5 months, 1 week ago
Description
(00:00:00) The AI Infrastructure Bottleneck
(00:01:06) The Data Fabric Dilemma
(00:03:51) Introducing Blackwell: A Physics Upgrade
(00:06:00) Scaling Blackwell to the Cloud
(00:08:08) The Importance of Orchestration
(00:14:01) The Data Layer Challenge
(00:18:07) Real-World Impact and Cost Savings
(00:22:19) The Future of AI Infrastructure
In this episode of M365.fm, Mirko Peters takes a deep dive into the NVIDIA Blackwell architecture and shows why most enterprise data fabrics, ETL pipelines, and storage layers are still too slow to keep modern AI and LLM workloads running at full speed. He explains how Grace‑Blackwell (GB200), NVLink, NVL72 racks, and Quantum‑X800 InfiniBand radically change the physics of data movement, collapsing CPU–GPU copies and rack‑to‑rack latency so your Azure ND GB200 v6 clusters finally operate at sustained throughput instead of burning budget on idle GPUs. You will hear concrete examples of where your current bottlenecks really sit today—latency in chatty ETL, slow storage lanes, legacy “AI‑ready” apps on old plumbing, and under‑designed datapipelines that starve even the best hardware.
Mirko walks through how Microsoft Fabric unifies warehousing, streaming, and real‑time analytics into a high‑bandwidth data fabric that can actually feed Blackwell‑class systems at model speed, from ingestion to vectorization and tokenization. He connects this to Azure AI Foundry, NVIDIA NIM microservices, and token‑aligned pricing so you understand how to scale training, RL training loops, and high‑volume inference while keeping an eye on cost per token, perf/watt, and sustainability. By the end, you will have a practical mental model for scalability: which workloads belong on ND GB200 v6, which must move to streaming datapipelines, and which you should keep off expensive GPUs entirely because the data fabric will never keep up.
You also get a concrete implementation checklist: how to profile GPU utilization vs. input wait, design NVLink‑aware placement, move from batch ETL to streaming, co‑locate feature stores and vector indexes with GPU domains, and bake telemetry SLOs (NVLink utilization, input latency, queue depth) directly into your ML and MLOps practices. Along the way, Mirko highlights the governance, DLP, and sustainability angles so your AI platform is not just fast, but also compliant and defensible towards security, finance, and CSR stakeholders. If you care about turning NVIDIA Blackwell, NVLink, InfiniBand, and Microsoft Fabric into real‑world business value, this episode gives you the language and patterns to have serious conversations with both architects and executives.
WHAT YOU WILL LEARN
(00:01:06) The Data Fabric Dilemma
(00:03:51) Introducing Blackwell: A Physics Upgrade
(00:06:00) Scaling Blackwell to the Cloud
(00:08:08) The Importance of Orchestration
(00:14:01) The Data Layer Challenge
(00:18:07) Real-World Impact and Cost Savings
(00:22:19) The Future of AI Infrastructure
In this episode of M365.fm, Mirko Peters takes a deep dive into the NVIDIA Blackwell architecture and shows why most enterprise data fabrics, ETL pipelines, and storage layers are still too slow to keep modern AI and LLM workloads running at full speed. He explains how Grace‑Blackwell (GB200), NVLink, NVL72 racks, and Quantum‑X800 InfiniBand radically change the physics of data movement, collapsing CPU–GPU copies and rack‑to‑rack latency so your Azure ND GB200 v6 clusters finally operate at sustained throughput instead of burning budget on idle GPUs. You will hear concrete examples of where your current bottlenecks really sit today—latency in chatty ETL, slow storage lanes, legacy “AI‑ready” apps on old plumbing, and under‑designed datapipelines that starve even the best hardware.
Mirko walks through how Microsoft Fabric unifies warehousing, streaming, and real‑time analytics into a high‑bandwidth data fabric that can actually feed Blackwell‑class systems at model speed, from ingestion to vectorization and tokenization. He connects this to Azure AI Foundry, NVIDIA NIM microservices, and token‑aligned pricing so you understand how to scale training, RL training loops, and high‑volume inference while keeping an eye on cost per token, perf/watt, and sustainability. By the end, you will have a practical mental model for scalability: which workloads belong on ND GB200 v6, which must move to streaming datapipelines, and which you should keep off expensive GPUs entirely because the data fabric will never keep up.
You also get a concrete implementation checklist: how to profile GPU utilization vs. input wait, design NVLink‑aware placement, move from batch ETL to streaming, co‑locate feature stores and vector indexes with GPU domains, and bake telemetry SLOs (NVLink utilization, input latency, queue depth) directly into your ML and MLOps practices. Along the way, Mirko highlights the governance, DLP, and sustainability angles so your AI platform is not just fast, but also compliant and defensible towards security, finance, and CSR stakeholders. If you care about turning NVIDIA Blackwell, NVLink, InfiniBand, and Microsoft Fabric into real‑world business value, this episode gives you the language and patterns to have serious conversations with both architects and executives.
WHAT YOU WILL LEARN
- Why most “AI‑ready” data fabrics still starve Blackwell GPUs with I/O waits, latency spikes, and slow storage lanes.
- How Grace‑Blackwell, NVLink, NVL72, and Quantum‑X800 InfiniBand transform rack‑scale throughput and scalability.
- How Azure ND GB200 v6, NVIDIA NIM, and Azure AI Foundry turn Blackwell into a managed, token‑priced AI platform.
- How Microsoft Fabric, streaming ingestion, and modern datapipelines keep LLM training, RL training, and inference continuously fed.
Listen Now
Love PodBriefly?
If you like Podbriefly.com, please consider donating to support the ongoing development.
Support Us