LLM Latency Issue

Currently hitting 5-10 seconds of Latency using Claude Sonnet 4.6 which is delaying my agents responses. any idea on how to fix this or connect my own API?

Thank you for reaching out to Retell AI Support. We’ve received your ticket and our team will respond within 8 hours.

Hi Retell,

It sounds like you’re seeing 5–10s response delays with Claude Sonnet and want ways to reduce latency or connect your own API.

Regarding high latency with LLM responses:

  • Check your agent’s estimated latency and disable features marked with the turtle icon if it’s above ~1.5s to reduce overhead.
  • Compare LLM response time to the normal 500–900ms range. If it’s consistently higher, switch to a faster tier or an alternative provider for the model.
  • If calls are international, using a local number closer to your users can help.
  • Also check the Status Page for any ongoing incidents.
    For details, see: Troubleshoot high latency.

To connect your own API, you have two options:

  • Custom LLM (LLM WebSocket): Retell connects to your server via llm_websocket_url, streams live transcript and inputs, and your server returns responses/actions in real time. This gives you full control over response generation and latency tuning. Custom LLM Overview and LLM WebSocket.
  • Custom Functions: Keep using Retell’s LLM but call your own endpoints (GET/POST/PUT/PATCH/DELETE) during the conversation to fetch data or run logic. Custom Function and Integrate any system with custom function.

To help investigate your current latency, could you share: 1) The affected call_id(s) 2) Whether this is on phone or web, and the callers’ region 3) Your current LLM model/tier selection 4) Any features you have enabled that show the turtle icon

If helpful, a screenshot of your agent’s latency estimate/config would also speed things up.

Did this address your question, or would you like guidance setting up the Custom LLM WebSocket or Custom Functions?

Best,
Evy AI
AI Support Agent @ Retell AI
Sources: | LLM WebSocket - Retell AI | Integrate any system with custom function - Retell AI | Custom Function - Retell AI | Troubleshoot high latency - Retell AI | Custom LLM Overview - Retell AI | Get Retell LLM - Retell AI

  1. call_8abf3943d5418a81468f7d608e7
    call_f13846829f25ba68a0aef0e743e
    call_4ec32934e56947685572eecbaf9
    call_2d16fa87bd6b07f661baa9a3781
  2. phone North America (NJ, USA)
  3. Claude Sonnet 4.6
  4. Speech Normalization and Safety Guardrails

Hi,

I reviewed the call IDs you provided. The latency analysis indicates that the LLM is the main cause of high latency. Claude Sonnet 4.6 shows a median response time of ~4.8s and up to ~8.7s at the 90th percentile. Transcription, TTS, and knowledge base latencies are within normal ranges.

This latency is expected for Claude Sonnet 4.6, as it is a heavier model. I recommend switching to a lighter, faster model like Claude 4.5 Haiku, GPT-4.1-mini, or GPT-4.1 to reduce response times significantly.

Best,
Stanley
Support @ Retell AI