Audio artifacts in Japanese-language calls (Cartesia voice)

Hi Retell team,

We’re seeing intermittent audio artifacts during Japanese-language agent speech when using Cartesia voices. Two recent examples:

Example 1 — short “squeak” at 0:06

https://dashboard.retellai.com/call-history?history=call_1070c24409eeb093a5d235d50fe

Example 2 — audible distortion at 1:59

https://dashboard.retellai.com/call-history?history=call_c62d54150b2f41c31ece28ac4a6

Both occurred mid-utterance during agent speech, not on the caller side, and both used a Cartesia voice. The artifacts are clearly audible to end users and risk eroding trust in the agent.

Could you:

  1. Investigate the root cause of these specific calls
  2. Share any recommended configuration or mitigation we can apply on our side to reduce the chance of this recurring
  3. Let us know whether you have any monitoring or detection on your end that could flag synthesis anomalies like these, so we don’t have to rely on user-reported issues.

Why we can’t simply switch providers: we’ve evaluated the other TTS options exposed through Retell for Japanese, and Cartesia is meaningfully better — pronunciation, prosody, and naturalness in Japanese are all noticeably ahead of the alternatives we tested. Moving off Cartesia would be a significant regression in voice quality for our users, so we’d much prefer a path that lets us keep using it.

Thanks,
Andrew

Hey @andrew3

Thanks for sharing this, I’ll forward this to our team for further investigation.

We’ll keep you updated as soon as we have more information.

Best regards