Episode Details
Back to EpisodesIndia’s AI still doesn’t speak India. Can it?
Description
ChatGPT butchers Punjabi with spelling errors and Bollywood-style Hindi bleeding through. Hindi bots trained on newspapers miss dialects like Awadhi and Bhojpuri entirely, while Tamil AI ignores the rich variations between Kongu and Madurai speech.
Sure, Gurugram collected ₹200 crore in taxes using Hindi AI calls, but that's because Hindi dominates datasets. Most other languages remain stuck in translation hell. Private companies optimize for speed over nuance, government corpora like Bhashini sit underused, and multimodal data that captures tone and emotion is too expensive to build.
The result? AI is flattening India's 780 languages into sanitized, standardized versions that erase the very dialects it claims to serve.
Read the newsletter here. Find the Duolingo article here.
Daybreak is produced from the newsroom of The Ken, India’s first subscriber-only business news platform. Subscribe for more exclusive, deeply-reported, and analytical business stories.