Episode Details
Back to Episodes
Your GPUs are waiting. And every second they wait, you're burning money
Description
Welcome to Is That How It Happened? -- the podcast where we dig into the real architecture behind the technology headlines. I'm your host, The Vanimal, and today we're talking about one of the most expensive and least-discussed problems in AI infrastructure: the storage wall.
As AI models push into the trillions of parameters, the bottleneck isn't compute anymore -- it's data delivery. Traditional TCP/IP networks are choking your pipelines, exhausting your CPUs, and leaving your H100 clusters idle while they wait on checkpoints and training data to load. That is not a networking inconvenience. That is a structural failure at scale.
In this episode, we break down how Remote Direct Memory Access -- RDMA -- tears that wall down entirely. We're talking zero-copy data movement, direct paths from storage to GPU memory with the CPU completely out of the loop, and real-world throughput numbers that will reframe how you think about AI infrastructure design.
We cover the TCP/IP bottleneck, the RoCEv2 versus InfiniBand debate, NVIDIA GPUDirect Storage, and how DeepSeek's 3FS architecture is hitting 6.6 terabytes per second of aggregate read bandwidth across 180 storage nodes.
If you're building, buying, or advising on AI infrastructure, this is not optional listening.
Let's get into it.