Episode Details
Back to Episodes
How Nvidia Made Its ASR Models 3x Faster Than the Competition
Description
This story was originally published on HackerNoon at: https://hackernoon.com/how-nvidia-made-its-asr-models-3x-faster-than-the-competition.
A technical deep dive into Nvidia’s Token-and-Duration Transducer architecture and how it achieves faster ASR inference with competitive accuracy.
Check more stories related to tech-stories at: https://hackernoon.com/c/tech-stories.
You can also check exclusive content about #nvidia-parakeet, #hugging-face, #asr, #tdt, #speech-recognition, #token-and-duration-transducer, #nvidia-asr-models, #good-company, and more.
This story was written by: @speechmatics. Learn more about this writer by checking @speechmatics's about page,
and for more stories, please visit hackernoon.com.
Nvidia's Parakeet models sit 3x clear of the rest of the Hugging Face Open ASR Leaderboard on throughput, with competitive accuracy. The reason is the Token-and-Duration Transducer (TDT), a small modification to RNN-T that adds a second head predicting how many encoder frames each token covers. Instead of advancing one frame at a time, the decoder skips. The result is up to 2.82x faster inference at comparable or better word error rate.