For Cartesia TTS sonic 3.5 , we found that words are being mispronounced with an English accent, which sounds unnatural and confusing to native Latin American Spanish speakers. After investigating the issue directly through a Cartesia TTS API, we identified that a specific tag value is misconfigured, causing the pronunciation behavior to be incorrect.
For the language parameter, several values are available to represent different Spanish dialects. For Latin American Spanish, the recommended setting is “es-MX”.
All of these language values produce the correct accent and pronunciation. However, when the language parameter is set to English (en), it results in the same incorrect pronunciation that we are currently observing in the Retell.
Could you please ask the team to look into this? Instead of setting the language parameter to en, can they update it to “es-MX” ? Based on our testing, using “es-MX” produces the correct Spanish pronunciation for Latin America, whereas “en” results in the same pronunciation issue.
Update this variable → language = “en” in the backend for cartesia config.
@Shaw Sorry about the confusion caused by previous image. When selecting the cartesia TTS voice in Retell, the voice traits indicate that it is Mexican, however, it does not consistently speak with a Mexican Spanish accent/pronunciation. Instead, it begins pronouncing words using an English accent and prounciation. If a word also exists in the English vocabulary, the system tends to prioritize the English pronunciation and accent.
We were able to reproduce this issue using the Cartesia API, when the language parameter was set to en.
Example Text:
Sí, quería quería contar de que tuve una buena jornada y, bueno, quería que me des algunos tips como para terminarla igual, de buena manera. ¿Eres un agente de asistente virtual
The pronunciation of “Jornada” and “virtual” is using English phonetics, with the J and TU sounds being pronounced according to English pronunciation rules.
The team can review the audio of above text from the Google Drive link, as I’m unable to upload audio samples here.
Alternatively, they can verify and reproduce the issue themselves by using the provided example and generating audio with the language parameter set to en or es-MX.
Hey @aamran For call_b8a1891ba1c87b190577b80ef0f, your agent is configured as single-locale, so we deterministically send language="es" to Cartesia on every TTS request (never "en"). The English-like pronunciation you’re hearing (“Jornada”, “virtual”) is how Cartesia’s cartesia-Alejandro + sonic-3.5 behaves under the neutral "es" hint.
Near-term options: try another LatAm/MX Cartesia voice; longer-term, we can file a request to emit region codes (e.g. es-MX) for es-419-only agents.