Handling city name variations in voice recognition

We currently have the following scenario:

When a call enters the flow, the agent asks the customer which city they would like to speak with or schedule an appointment in. After the customer responds, the prompt identifies the city mentioned and normalizes the result into a variable.

At the moment, we use the following normalization logic:

If the identified city is one of the cities served by our network (for example: Porto Alegre, Canoas, Blumenau, Lajeado, Campo Bom, Canela, Caxias do Sul, Erechim, Esteio, Gaspar, Gramado, Gravataí, Jaraguá do Sul, Novo Hamburgo, Rio Grande, or São Leopoldo), the variable is set to cidade_credenciada.

Otherwise, the variable is set to outra.

After that, the flow uses this variable to route the call to the corresponding trunk within our PBX.

The problem we are facing is that in phone calls, speech recognition sometimes returns variations of the city name due to pronunciation or transcription differences, for example:

  • Canoas may be transcribed as Canoa

  • Canela may be transcribed as Camela

  • Esteio may be transcribed as Isteio

  • Gravataí may appear with similar phonetic variations in the transcription

We would like to know if there is any recommended prompt structure or best practices in Retell to handle speech variations or transcription differences for the same city, while keeping the recognition fast and reliable.

Any example of a prompt or architecture that works well for this type of scenario (city identification in phone calls) would be greatly appreciated.

Thank you for reaching out to Retell AI Support. We’ve received your ticket and our team will respond within 8 hours.

Hi Retell,

It sounds like you need best practices to reliably capture a caller’s city despite ASR variations (e.g., “Canoa” for “Canoas”) so your flow can route via your PBX.

Recommendations you can apply now:

  • Boost the city list in transcription
  • Prefer accuracy when capturing entities
    • For the city-capture turn, use transcription mode “optimize for accuracy” to improve entity capture (adds ~200ms) transcription mode.
  • Set the correct recognition language
    • Ensure language matches your callers (e.g., Brazilian Portuguese) or use multi if you expect mixed language input agent language.
  • Tighten prompt structure around the capture step
    • Use a sectional prompt and explicit task steps; after the user provides a city, confirm and normalize before proceeding prompt structure, single-prompt guide.
    • Include guidance to infer through minor ASR errors, and ask a natural confirmation if unsure (without mentioning “transcription error”) LLM prompt guideline.
  • Architect for deterministic routing
    • If the flow is branching on multiple conditions, consider a Conversation Flow or Multi‑Prompt setup to isolate a “City Capture” state, confirm, then call your routing/tool only after a validated value is set prompt overview, prompt engineering guide.
    • If you use a custom function to set the PBX trunk, explicitly instruct when to call it, e.g., “Once the city is confirmed, call set_trunk(city_normalized)custom function.

Example prompt snippet you can adapt:

  • Task steps:
  1. Ask: “Para qual cidade você quer atendimento?” 2) Listen and map the response to one of: [Porto Alegre, Canoas, Blumenau, Lajeado, Campo Bom, Canela, Caxias do Sul, Erechim, Esteio, Gaspar, Gramado, Gravataí, Jaraguá do Sul, Novo Hamburgo, Rio Grande, São Leopoldo]. 3) If the response is a close phonetic match to an item in the list, infer the intended city; otherwise treat as “outra”. 4) Confirm: “Você quis dizer Gravataí, certo?” If confirmed, set cidade_credenciada = Gravataí; else ask to repeat. 5) After confirmation, call the routing function with the normalized city.
  • Response guideline: “If ASR text looks slightly off but clearly references a known city (e.g., ‘Canoa’ ~ ‘Canoas’), infer the intended city and confirm once.”

Optional tuning if you notice misses due to noise or very short answers:

Would you like help adding your city list to boosted_keywords and reviewing your prompt/flow state for the city‑capture step? If you share the agent_id and a short call example where “Canoas” was misrecognized, I can suggest exact field values and a tailored prompt block.

Best,
Evy AI
AI Support Agent @ Retell AI
Sources: List Voice Agents - Retell AI | Increase transcription accuracy - Retell AI | Create Voice Agent - Retell AI | Update Voice Agent - Retell AI | Balance between transcription accuracy and latency - Retell AI | | Write a single prompt - Retell AI | Single/Multi Prompt Agent Overview - Retell AI | Prompt Engineering Guide - Retell AI | Integrate any system with custom function - Retell AI | Handle background speech & noise - Retell AI | Integrate LLM - Retell AI | Custom LLM Best Practices - Retell AI | Dynamic Variables - Retell AI

Thank you for the recommendations.

I would like to understand how I should add the boosted_keywords list in my current setup.

Right now, my flow is implemented inside the agent conversation, where the prompt captures the city mentioned by the caller and normalizes the result into a variable that is later used to route the call to the correct trunk in our PBX.

The current prompt we are using is the following:

Ao iniciar a chamada, diga:

“Para qual cidade você deseja agendar?”

Aguarde a resposta.

Identifique a cidade mencionada pelo cliente.

Normalize a resposta para:

  • “cidade_credenciada” se for Porto Alegre ou Canoas ou Canoa ou Blumenau ou Lajeado ou Campo Bom ou Canela ou Camela ou Caxias do Sul ou Erechim ou Ele Sim ou Esteio ou Isteio ou Gaspar ou Gramado ou Gravataí ou Gravar está aí ou Gravata aí ou Jaraguá do Sul ou Novo Hamburgo ou Rio Grande ou São Leopoldo
  • “outra” para qualquer outra cidade

Salve o resultado normalizado na variável “cidade”.

Ignore diferenças de acento, plural ou pequenas variações de escrita ao identificar a cidade.

Após identificar a cidade, diga:
“Vou transferir sua ligação para o setor responsável.”

Não faça perguntas adicionais.
Após confirmar a cidade e informar que será transferido, finalize sua fala para que o fluxo siga para a transferência.

The routing logic happens after this step based on the value of the cidade variable.

I also attached an image showing how the current flow is configured.

Could you please clarify:

  1. How should I properly add the boosted_keywords list for the cities we serve?

  2. Should those keywords still appear inside the prompt, or should they only be configured in the agent settings?

  3. Is there a recommended structure for this kind of city capture step in voice calls?

For reference, the agent_id is:

agent_39c89a178824081c553bf3c654

(or the dashboard link:
https://dashboard.retellai.com/agents/agent_39c89a178824081c553bf3c654)

Any guidance on the best way to configure this would be very helpful.

Hello @suporte

You can do two things

1- is to add the keywords to the boosted keywords inside the settings

2- You can use the extract dynamic variable node to extract the city name

and use enum and add the cities

One thing also I noticed in your workflow inside the logic split node

When comparing two strings, cidade==”outra”, it will always result in false; you have to use dynamic viarables which will be inside {{}}

For example {{cidade}} ==”outra”,

But you must give the variable value through the extract dynamic variable function

Thank you for the response.

I have already added the cities to the boosted keywords in the agent settings.

However, I could not find where to configure the Dynamic Variable Extraction node. Could you please clarify where this option is located in the Retell interface?

Also, where exactly should this node be placed within my current flow?

Another question: if I use the dynamic variable extraction with an enum for the cities, can I remove the list of cities from the prompt and leave them only in the enum configuration, letting the prompt simply ask for the city?

I want to confirm if the correct approach would be:

  1. The prompt only asks the user which city they want.

  2. The Dynamic Variable Extraction node captures the city and maps it using the enum list.

  3. The Logic Split then routes the call based on the extracted variable.

Could you confirm if this is the recommended approach?