미디어 커버리지1건1개 미디어
경제
보수 성향

India’s AI dream is getting lost, but where?

The Economic Times (India)
조회 0

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

When Indian Prime Minister Narendra Modi hosted world leaders and tech chiefs earlier this year in New Delhi, he declared that AI must be “democratized” and a “medium of inclusion and empowerment, especially across the Global South.”It’s a convenient vision for Silicon Valley, which is in the midst of an ongoing landgrab for the lucrative market.

Young, tech-savvy and mobile first, India has become one of the most important growth regions for AI, ranking behind only the US in usage for both OpenAI’s ChatGPT and Anthropic’s Claude.But the key to both the “inclusion” and business dreams of tech diffusion is overcoming the barriers to speech.

India has nearly two dozen official languages and more than a hundred dialects.

If AI can’t close this gap, it will just become another technology that divides the English-speaking elite and everyone else.

True localization will depend on whether models can comprehend Bengali voice notes, Gujarati payment queries, and code-switched Hindi-English business calls — all the messy and real-world spoken words that drive daily commerce and public life.Also read: Big Four rethink partnership as AI changes the gameMore than a billion people speak Indic languages.

Yet one study found that GPT 5 only achieved about 45% accuracy on a human-curated benchmark covering 11 of them, including Modi’s mother tongue, Gujarati.The first generation of AI tools were trained on internet text, the majority of which is in English.

Technical improvements and better datasets have helped recent models improve in non-English and so-called “low-resource languages,” those with less data to train on.

But the language gap persists, especially for speech, forecast to become the next mass way of interacting with models.“Voice is the most intuitive interface for humans, especially in more developing regions,” Sandeep Chinchali, the co-founder of Poseidon, an Andreessen Horowitz-backed data infrastructure startup, told me.

And South Asia, he added “uses voice for everything,” with businesses running through phone calls, WhatsApp voice memos, speech-based payments and increasingly voice-enabled coding tools.

AI systems that can’t comprehend these interactions will be useless in automating this work, not to mention potentially dangerous in public services.One problem is a lack of proper benchmarks for non-English models.

Leading ones, for example, can’t even agree on what proper Bengali — a language spoken by more than 280 million — should look like.

The heart of the issue is still data, Chinchali says, and not just quantity (Bengali makes up less than 0.1% of web text) but also quality.Spoken Indic languages add another layer of difficulty: regional variants, background noise, and frequent code-switching in technical and financial conversations.

Speech data for AI training requires accurate transcription, longer clips, varied acoustic environments, demographic and regional variants, as well as careful human review before they can be truly improve AI models.

Systems trained on narrower datasets often fail in the real world, where conversations mix local slang and borrowed English words in varied settings.Also read: Employers across globe continue to have confidence in MBA graduates despite AI boom: GMAC reportIndia Inc. understands the stakes.

Cracking the language challenge has become central to the country’s broader sovereign AI push.

Building AI that work’s “at India’s scale” presents a massive opportunity, Pratyush Kumar, the co-founder of domestic AI hopeful Sarvam AI, said in a statement this month, announcing a new funding round.

That means models that “understand our voices” and “read our documents.” In April, the startup that so many are pinning their hopes on for catching up in a US-China race launched a new evaluation for Indic speech recognition, arguing that the standard metrics were not built for these languages and can distort how such systems are judged.US tech giants are paying attention, too.

OpenAI last year unveiled a framework for evaluating AI systems on Indian culture and language.

And Modi’s government has launched a translation platform that also collects spoken data to improve multilingual models.But crowdsourcing is no cure-all.

High standards and human curation remains vital.

A Stanford team warned in a paper last year that quality has become a key challenge when trying to scale such endeavors.

It also raised ethical questions in a sector with a long history of poor pay and exploitation.Poseidon’s Chinchali says it worked with a supplier in India that is committed to fair pay and is exploring blockchain tools that would give contributors even more say over how their data is deployed — for example, it’s used by companies within their own country versus being leaked out to train foreign AI tools.

These are good steps and should become the baseline, not the exception.Also read: Alternatives to OpenAI, Anthropic: With US prime AI off the table, India opts for fine ChinaThe government is already forcing high school students to learn three languages, including two indigenous ones.

If Modi is serious about making India an AI superpower, he should demand something similar from model builders and craft policy that ensures systems that can truly comprehend the linguistic diversity.There’s also a safety issue.

As AI moves into schools, hospitals, courts and public service, language failures have consequences.

Safety alignment tends to deteriorate when people engage with AI in low-resource languages, researchers have found.

It means the people most likely to be left behind by the tech revolution may also be the least protected from its risks.Bridging the language divide is key to Silicon Valley cracking this next growth market.

And Modi’s promise that it will deliver “happiness for all, welfare for all” will ring hollow if the technology cannot even understand the people it claims to empower. ...

전문 보기

관련 뉴스

관련 뉴스 제보는 로그인 후 가능합니다.