Episode Details

Back to Episodes
How Nvidia Made Its ASR Models 3x Faster Than the Competition

How Nvidia Made Its ASR Models 3x Faster Than the Competition

Published 19 hours ago
Description

This story was originally published on HackerNoon at: https://hackernoon.com/how-nvidia-made-its-asr-models-3x-faster-than-the-competition.
A technical deep dive into Nvidia’s Token-and-Duration Transducer architecture and how it achieves faster ASR inference with competitive accuracy.
Check more stories related to tech-stories at: https://hackernoon.com/c/tech-stories. You can also check exclusive content about #nvidia-parakeet, #hugging-face, #asr, #tdt, #speech-recognition, #token-and-duration-transducer, #nvidia-asr-models, #good-company, and more.

This story was written by: @speechmatics. Learn more about this writer by checking @speechmatics's about page, and for more stories, please visit hackernoon.com.

Nvidia's Parakeet models sit 3x clear of the rest of the Hugging Face Open ASR Leaderboard on throughput, with competitive accuracy. The reason is the Token-and-Duration Transducer (TDT), a small modification to RNN-T that adds a second head predicting how many encoder frames each token covers. Instead of advancing one frame at a time, the decoder skips. The result is up to 2.82x faster inference at comparable or better word error rate.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us