Latency issues inbound calls


Based on the available documentation, here’s how latency works in Retell AI’s architecture:

Where Latency Happens in Voice Automation:(1)

  1. Speech Recognition Processing (ASR) - Time taken to transcribe spoken words into text(1)
  2. Response Generation (NLG or LLM) - Time to understand and craft an appropriate, contextual reply(1)
  3. Speech Synthesis (TTS) - Time to turn the generated text back into spoken words(1)
  4. Network Transmission - Delays caused by sending audio and data between systems, especially in cloud setups(1)

Regarding TTS Provider Traffic:

The documentation indicates that Retell AI uses a distributed architecture with edge deployment to minimize latency.(2) The platform processes speech through multiple stages, and yes, there is network communication between Retell’s servers and TTS providers.(1)

Additional Troubleshooting Steps:

Since you’ve already tried different TTS providers with no significant change, consider:

  • Check if you’re using features marked with :turtle: in your configuration, as these add latency(3)
  • Verify your LLM response time is in the normal range (500ms-900ms)(3)
  • Review your denoising mode setting - switching between “noise-cancellation” and “noise-and-background-speech-cancellation” can impact performance(4)
  • Check your STT mode - you can choose between “fast” and “accurate” modes(4)

For detailed investigation of your specific issue, contact support with your call IDs through the Discord community or visit Contact Sales.(3)