Cartesia TTS : Spanish voice Pronunciation bug, English accent applied to Spanish words

For Cartesia TTS sonic 3.5 , we found that words are being mispronounced with an English accent, which sounds unnatural and confusing to native Latin American Spanish speakers. After investigating the issue directly through a Cartesia TTS API, we identified that a specific tag value is misconfigured, causing the pronunciation behavior to be incorrect.

For the language parameter, several values are available to represent different Spanish dialects. For Latin American Spanish, the recommended setting is “es-MX”.

“es” = generic Spanish (Cartesia picks a neutral accent)
“es-MX” = Mexican Spanish
“es-ES” = Castilian Spanish (Spain)
“es-US” = US Spanish / Spanglish accent

All of these language values produce the correct accent and pronunciation. However, when the language parameter is set to English (en), it results in the same incorrect pronunciation that we are currently observing in the Retell.

Could you please ask the team to look into this? Instead of setting the language parameter to en, can they update it to “es-MX” ? Based on our testing, using “es-MX” produces the correct Spanish pronunciation for Latin America, whereas “en” results in the same pronunciation issue.

Update this variable → language = “en” in the backend for cartesia config.

Hey @aamran Let me check with the team on this.

Hello @aamran the language picker is for STT, not TTS.

You need to choose a voice and voice model that matches the accent you want.

@Shaw Sorry about the confusion caused by previous image. When selecting the cartesia TTS voice in Retell, the voice traits indicate that it is Mexican, however, it does not consistently speak with a Mexican Spanish accent/pronunciation. Instead, it begins pronouncing words using an English accent and prounciation. If a word also exists in the English vocabulary, the system tends to prioritize the English pronunciation and accent.

We were able to reproduce this issue using the Cartesia API, when the language parameter was set to en.

Example Text:

Sí, quería quería contar de que tuve una buena jornada y, bueno, quería que me des algunos tips como para terminarla igual, de buena manera. ¿Eres un agente de asistente virtual

The pronunciation of “Jornada” and “virtual” is using English phonetics, with the J and TU sounds being pronounced according to English pronunciation rules.

The team can review the audio of above text from the Google Drive link, as I’m unable to upload audio samples here.

Alternatively, they can verify and reproduce the issue themselves by using the provided example and generating audio with the language parameter set to en or es-MX.

@aamran I have escalated this to the team.

Hey @aamran Can ou share the call ID?

Call Id = call_b8a1891ba1c87b190577b80ef0f

Hi @aamran Thanks for the Call ID, I have esclated this to the team for a review.

Any update on that ?

Hey @aamran For call_b8a1891ba1c87b190577b80ef0f, your agent is configured as single-locale, so we deterministically send language="es" to Cartesia on every TTS request (never "en"). The English-like pronunciation you’re hearing (“Jornada”, “virtual”) is how Cartesia’s cartesia-Alejandro + sonic-3.5 behaves under the neutral "es" hint.

Near-term options: try another LatAm/MX Cartesia voice; longer-term, we can file a request to emit region codes (e.g. es-MX) for es-419-only agents.

Thank You

@Shah-Fazal

we can file a request to emit region codes (e.g. es-MX) for es-419-only agents.

Can you assign that to this agent : agent_98123fc1c7d948c79729a9a165

With es-MX assigned to voice-id = custom_voice_36b2eddc930d8095593691254d

@aamran Escalated with the team

Please do notify me when it is done.

Hey @aamran Currently, we don’t have a per-agent override to emit es-MX
But Team have filed a feature request with engineering to add this.