Hello Sérgio,
Thank you for reaching out with these detailed observations about your European Portuguese (PT-PT) voice agent.
Known Issues with European Portuguese
Your experience with PT-PT aligns with documented challenges. Users have reported that Portuguese European performs poorly, with voices consistently sounding like Brazilian Portuguese rather than European Portuguese(1). The only decent results achieved have been with Realtime LLM voices, which mostly speak in proper PT-EU, though they annoyingly switch to Brazilian Portuguese at the end of conversations when saying “goodbye”(1).
To improve PT-EU performance, it’s recommended to use Realtime LLM voices and add a prompt like “Respond in European Portuguese” at the start and end to avoid accent switching(1). For best results, clone a native PT-EU speaker’s voice(1).
Speech Recognition and ASR Challenges
Your issues with speech recognition, especially regarding email addresses and phone numbers, reflect broader ASR challenges. Retell AI has acknowledged working on improvements for phone number and email address recognition(2). Users have reported getting names wrong, phone numbers wrong, and words dropping out during transcription(2).
ASR quality directly impacts every part of the AI voice agent experience(3). If the transcription is inaccurate, even the most advanced AI systems will misunderstand the user’s intent and deliver poor results(3). Strong ASR delivers faster, more accurate conversations and better intent recognition(3).
Recent ASR Improvements
Retell AI has rolled out a major upgrade to its Automatic Speech Recognition (ASR) engine, bringing sharper transcription, stronger intent detection, and more reliable call outcomes(4). The new ASR now supports 22+ new languages, bringing the total language count to 50+(4).
For European languages in Accurate mode, Word Error Rate has been cut by 7–10 points(4). This results in clearer transcripts even in noisy environments, more reliable intent capture for complex responses, and fewer repair turns(4).
Latency Considerations
Latency refers to the time delay between a user’s action and the system’s response(5). In AI voice interactions, it’s measured in milliseconds and can make or break the perceived quality of an AI-driven call experience(5). Humans expect near-instantaneous responses, usually within 300-500 milliseconds(5).
Retell AI achieves sub-600ms latency performance, which is crucial for natural conversation flow(6). This low latency ensures that customers experience minimal delays during interactions(6).
Latency can occur at multiple points: Speech Recognition Processing (ASR), Response Generation, Speech Synthesis (TTS), and Network Transmission(5). Optimization strategies include using ultra-fast ASR and TTS engines, deploying AI models closer to the customer’s location, pre-loading likely responses, and optimizing API integrations(5).
Recommendations
Based on the available information:
- Use Realtime LLM voices for PT-EU and add explicit prompts to maintain European Portuguese throughout the conversation(1)
- Consider voice cloning with a native PT-EU speaker for optimal results(1)
- Be aware that email and phone number recognition is an acknowledged area for improvement(2)
- Monitor ASR performance as Retell AI continues to enhance transcription accuracy(4)
For additional support with your specific configuration, I recommend:
Would you like me to provide a general answer based on broader voice AI knowledge instead?