Best-practice agent architecture for stable multi-language handling (language: "multi")

Context

We run voice agents for hotels with a large number of international guests, so callers may speak many different languages. We currently use the language: “multi” setting so a single agent can handle whatever language the caller uses.

What we’d like help with

  1. Multi-language support going forward. We’d like to keep using multi, or have you suggest a better approach for letting one agent handle as wide a range of caller languages as possible. Restricting to a small fixed language set isn’t a good fit for our use case.
  2. Reported increase in wrong-language transcription. Over the last week, people testing our agents in Japanese have noticed an increase in speech being transcribed into the wrong language. We haven’t yet been able to isolate a clean-audio reproduction, so we’re raising this as an observed trend rather than a confirmed bug —but we’d like to know whether anything changed in the last week or so on the STT/ASR or language-detection side for multi agents.
  3. Recommended architecture & settings. What agent architecture and settings do you recommend to support a wide range of caller languages while keeping language handling stable well into a call (i.e. it shouldn’t lock onto an incorrect language mid-conversation and fail to recover)? For example: transcription mode, ASR provider choice, multi-agent/language-routing patterns, or any configuration that improves robustness without restricting the supported languages.

Our setup

  • language: “multi” on all agents
  • Transcription mode / ASR provider: using Retell defaults (not explicitly configured

We ran into a similar architecture question and moved away from relying on one “multi” agent to handle everything end-to-end.

What worked better for us was creating language-specific versions of the agent instead of one universal multi-language agent. Each version had:

  • the target language set directly

  • an accent/voice aligned to that caller group

  • the prompt written in that same language

  • culturally appropriate phrasing, formality, and local conversational patterns

  • in some cases, a dedicated phone number or routing path for that language

It is a bit more work architecturally, because you end up with more of an agent swarm than a single agent. But the client-side experience was noticeably better. Callers do not just need the agent to technically understand the language — they need to feel like the agent is actually speaking to them naturally in their language.

For Japanese specifically, this mattered a lot. The agent needed to sound more structured, polite, and culturally aligned, not just translated. Things like honorifics, formality level, pacing, and respectful phrasing made a real difference. A generic multi-language agent could technically respond in Japanese, but it did not feel as native or trustworthy as a dedicated Japanese agent with the right prompt, voice, and language settings.

So my recommendation would be:

Use “multi” only as a front-door routing layer if you need broad intake, but do not rely on one multi agent for the entire guest experience in high-value languages. For the languages that matter most to your hotel traffic, create dedicated language agents and route callers into the right one as early as possible.

A practical setup could be:

  1. Main intake/routing agent detects the caller’s preferred language.

  2. If the language is high-volume or strategically important, transfer to the dedicated language-specific agent.

  3. If it is a lower-volume language, keep the caller on the multi agent as a fallback.

  4. Give major language groups their own phone numbers, IVR options, or booking-flow entry points where possible.

This does add more maintenance, but it makes the system much more stable. You avoid the agent locking onto the wrong language mid-call, and you get a much better experience because each agent is tuned for that caller’s actual language, accent, and cultural expectations.

Hello @andrew3

Thanks for the details,

  1. The generic Multilingual (multi) value is now flagged as a legacy setting that only covers 10 languages (English-US, Spanish-ES, French-FR, German-DE, Hindi-IN, Russian-RU, Portuguese-PT, Japanese-JP, Italian-IT, Dutch-NL). Retell recommends switching the language selector to Multiselect and picking the specific languages you actually need — narrower sets are more accurate. For very broad coverage, Soniox supports multilingual code-switching across ~50 languages versus Deepgram’s 10.

  2. Can you provide some Call Ids and which model are you using.

  3. Recommended architecture for stable handling.

  • If you can determine the caller’s language up front (CRM, dialed number), keep agents single-language and override per call via the inbound call webhook — best accuracy and avoids mid-call drift.

  • Otherwise, use Multiselect with the minimum set of languages you truly need. Retell auto-routes: all-within-Deepgram-10 → Deepgram multilingual; broader sets → Soniox.

  • Set transcription mode to “optimized for accuracy” (cascading agents) to reduce mid-sentence misfires.

  • Add Boosted Keywords (hotel name, room types, location names) — up to 100.

  • Tune denoising mode; for clean audio, “No Denoising” can improve accuracy.

  • Note: voice pronunciation falls back to the first language in your selected list if detection fails, so order matters.

Thank You

Hi, I have the same Architecture as you guys have. One router agent to fetch the language, just by asking what language the caller speaks.

Then multiple agents set with a native voice for 11Labs.

The problem i keep having is that the router agent somehow does hear it correctly and just continue in english. I had it on deepgram and then changed to to soniox. Not sure if that works.

Are there more thing i can do to fetch the language of the caller?

Router agent ID: agent_fd75b5ddc39228efa740f2eb9d if that helps?

Yeah, we’ve seen this pattern too. In my opinion the router agent should still be treated as a very deliberate language-detection agent, not just a normal agent that happens to ask “what language do you speak?”

A few things I would try:

First, make sure the router agent itself is set up for multi-language detection. If the router is effectively biased toward English, it may “hear” the caller correctly but still continue in English because the prompt, default language, or response behavior is pulling it back there.

Second, I would make the router prompt much heavier around language detection. Don’t only say “ask what language the caller speaks.” Add repeated, explicit instructions like:

  • Your only job is to identify the caller’s preferred language.

  • Do not continue the conversation in English unless the caller clearly chooses English.

  • If the caller answers in Spanish, Japanese, French, Arabic, etc., immediately classify that language and transfer to the matching agent.

  • If you are unsure, ask one short clarification question in simple multilingual wording.

  • Never assume English just because detection is uncertain.

I’d also include the language-selection instruction in several major languages inside the router prompt. For example, if you support Japanese, Spanish, French, German, etc., include short examples of how a caller might answer in each language and what the router should do. That gives the model more anchors than a single English-only instruction.

For example:

Caller says “日本語,” “Japanese,” “nihongo,” or answers in Japanese → route to Japanese agent.
Caller says “español,” “Spanish,” “hablo español,” or answers in Spanish → route to Spanish agent.
Caller says “français,” “French,” or answers in French → route to French agent.

The goal is to make the router behave less like a conversational agent and more like a language-classification step.

Third, I would keep the router extremely short. It should not try to help, explain, or continue the booking/service flow. It should ask for language, detect it, confirm only if needed, and transfer. The longer it stays in the conversation, the more chance it has to drift back into English.

Something like:

“Hi, what language would you like to continue in?”

Then once the caller responds, the router should immediately route. No extra small talk.

Soniox vs Deepgram may help, but I don’t think ASR alone solves this if the router prompt is under-specified. The router needs strong prompt reinforcement, multilingual examples, and a very narrow job: detect language, do not continue the call, transfer to the correct native-language agent.

For your setup, I’d probably strengthen the router prompt first before changing more ASR settings. Multi-language detection is good, but it still needs a lot of structure around it if you want reliable routing across many languages.

Our team likes to ‘overwhelm’ problems with solutions until they are bent into the shape we want. At least with prompting.

It works so we go with it.

This is the prompt i use for the routing agent. I think it fits your description:

Je bent BRAM, een veiligheidsregistratie-assistent. Jouw enige taak is de taal van de beller bepalen en direct doorverbinden naar de juiste taal-agent.

**LUISTEREN:** Wacht altijd totdat de beller volledig gestopt is met praten voordat je reageert. Reageer nooit op de inhoud van een melding — dat is de taak van de taal-agent.

**TAALDETECTIE:** Detecteer de taal op basis van meerdere woorden, niet op basis van één enkel woord. Als iemand “OK”, “Yes”, “Hallo”, “Hello” of andere internationale woorden zegt, wacht dan op meer context. Als iemand gewoon begint te praten zonder de taalnaam te noemen, detecteer dan op basis van de gesproken woorden.

Taalherkenning — herken zowel de gesproken taal ALS de naam van de taal:

  • Nederlands / Dutch / Niederländisch / Holenderski → transfer_nederlands
  • Engels / English / Englisch / Angielski / İngilizce → transfer_english
  • Duits / German / Deutsch / Niemiecki / Almanca → transfer_deutsch
  • Pools / Polish / Polnisch / Polski / Lehçe → transfer_polski
  • Turks / Turkish / Türkisch / Turecki / Türkçe → transfer_turkce
  • Slowaaks / Slovak / Slowakisch / Slovenčina → transfer_slovencina
  • Spaans / Spanish / Spanisch / Español → transfer_espanol
  • Arabisch / Arabic / Arabisch / العربية → transfer_arabisch

**DOORVERBINDEN:** Verbind direct door zodra de taal duidelijk is. Stel geen verdere vragen.

**FALLBACK:** Als de taal na één herhaling nog niet duidelijk is, verbind dan door via transfer_nederlands.

It is set on multiligual - multiselection dif language

this is the first message the agent says:

Welcome Message

Pause Before Speaking: 1.0s - AI speaks first - Custom message

Ik ben BRAM, de veiligheidsassistent. Welke taal spreekt u? … I’m BRAM, the safety assistant. Which language do you speak?

Its all in Dutch as that is the country and main language used.