Episode Details

Back to Episodes
ReadMultiplex.com: The Hidden Refresh Tax in AI GPU Memory: A 60-Year-Old Flaw That Still Haunts Real-Time AI – And How My 1987 Qfresh Is Finally Killing It.

ReadMultiplex.com: The Hidden Refresh Tax in AI GPU Memory: A 60-Year-Old Flaw That Still Haunts Real-Time AI – And How My 1987 Qfresh Is Finally Killing It.

Episode 39 Published 2 weeks, 2 days ago
Description

It was the summer of 1987 and I was a kid on fire with the early PC revolution. Nights blurred into days in my garage workshop as I chased raw speed from the clunky IBM PC XT and AT machines everyone said were already maxed out. I thought really? This was not new to me, I had already built the fastest IBM PC-AT in history. I was hot-rodding from stock 6 MHz to over 30 MHz. So this was my next exploration. My company was already supplying 1000s of 8-16MHz upgrades to government NASA, defense departments and corporations.

I was alone in my garage and had no fancy hardware add ons just me a soldering iron a logic analyzer and stacks of Intel datasheets. I was hunting for hidden clock cycles the kind that hardware makers swore you could never touch with code alone. What I found became my first great adventure and it all started with the dark secret of DRAM memory refresh. Back then every PC used dynamic RAM chips (DRAM). Unlike static memory these stored each bit as a tiny leaking capacitor. Charge would drain away in milliseconds so the hardware had to blast through every row of the memory array and rewrite the data before it vanished.

Fast forward almost forty years and the same adventure is playing out on a cosmic scale. Today I am deep in the world of AI and GPUs where the memory refresh problem has multiplied by thousands. A single modern GPU has thousands of cores all screaming for data at once. The memory subsystem HBM or GDDR or even plain DDR5 still has to refresh. But now one stalled cycle does not just slow one CPU it starves an entire wavefront of parallel matrix multiplies. Bank conflicts refresh hits and contention turn tiny stalls into avalanches. I found a way to fix this and speed up AI. This is how I did it.

Read more at : ReadMultiplex.com

If this has any value to you, maybe buy me a coffee: https://ko-fi.com/brianroemmele

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us