Multilingual RAG for India — Bhashini hooks and cross-lingual retrieval

The constraint

A bank serving 12 Indian languages. The user types in Marathi; the corpus might be in English; the answer needs to come back in Marathi. Standard RAG, three languages deep.

Doing it the naive way — separate embedding model per language, separate retrieval index per language — explodes complexity. Doing it well needs cross-lingual embeddings and a translation hop.

The shape

User query (Marathi)
       ↓
Translate? Or use cross-lingual embedding directly?
       ↓
Retrieve from a single index (cross-lingual)
       ↓
Generate answer in the user's language

Two architectural choices: cross-lingual embeddings (one index, all languages) or translate-then-retrieve (per-language indexes).

Cross-lingual embeddings

Some embedding models (Cohere’s multilingual, mE5, BGE-M3) map semantically similar content across languages to nearby vectors. The Marathi query “रिफंड पॉलिसी काय आहे?” lands near English “what is the refund policy” in the embedding space.

This means one index serves all languages. Storage stays single-corpus; retrieval stays single-query.

Bhashini for the translation hops

Bhashini is the Indian government’s language stack — speech-to-text, text-to-text translation, text-to-speech, across all 22 scheduled Indian languages.

For the parts where you do need translation (the LLM’s final answer is in English; the user wants Marathi), Bhashini provides production-grade APIs:

client := bhashini.New(apiKey)
marathi, _ := client.Translate(ctx, &TranslateRequest{
    SourceLang: "eng",
    TargetLang: "mar",
    Text:       englishAnswer,
})

Bhashini’s translation quality for Hindi/Marathi/Tamil/Bengali is dramatically better than the English-trained models that have to bootstrap into these languages.

The pipeline that worked

For one Indian banking pilot:

User submits query in Marathi.
Embed query directly with BGE-M3 (cross-lingual).
Retrieve from English corpus (the bank’s policies are English-source).
LLM generates answer in English (better quality on technical content).
Bhashini translates English answer → Marathi.
Return Marathi answer with citations to English source documents.

Two translation hops (none on retrieval, one on the answer). Citation quality preserved; user-facing language honoured.

What didn’t work

Single LLM doing English retrieval + Marathi generation in one prompt. Quality drops on both ends.
Per-language indexes. Storage and ingestion cost 12×. Maintenance is brutal.
Voice-to-voice without a text intermediate. Bhashini ASR + TTS quality is good; relying purely on voice loses too much fidelity for banking.

Where this is going

Indic LLMs are getting better fast. The pipeline I’d build today might not need the translation hops in 18 months — the LLM might generate directly in the target language at acceptable quality.

For now, the cross-lingual embedding + English LLM + Bhashini translation pattern is the production-ready answer. The architecture is documented for one of the engagements I worked on; the patterns generalise across any multi-language Indian deployment.