Hi Retell team,
We’re seeing intermittent audio artifacts during Japanese-language agent speech when using Cartesia voices. Two recent examples:
Example 1 — short “squeak” at 0:06
https://dashboard.retellai.com/call-history?history=call_1070c24409eeb093a5d235d50fe
Example 2 — audible distortion at 1:59
https://dashboard.retellai.com/call-history?history=call_c62d54150b2f41c31ece28ac4a6
Both occurred mid-utterance during agent speech, not on the caller side, and both used a Cartesia voice. The artifacts are clearly audible to end users and risk eroding trust in the agent.
Could you:
- Investigate the root cause of these specific calls
- Share any recommended configuration or mitigation we can apply on our side to reduce the chance of this recurring
- Let us know whether you have any monitoring or detection on your end that could flag synthesis anomalies like these, so we don’t have to rely on user-reported issues.
Why we can’t simply switch providers: we’ve evaluated the other TTS options exposed through Retell for Japanese, and Cartesia is meaningfully better — pronunciation, prosody, and naturalness in Japanese are all noticeably ahead of the alternatives we tested. Moving off Cartesia would be a significant regression in voice quality for our users, so we’d much prefer a path that lets us keep using it.
Thanks,
Andrew