LLM Latency Issue

yusuf · March 10, 2026, 5:07am

Currently hitting 5-10 seconds of Latency using Claude Sonnet 4.6 which is delaying my agents responses. any idea on how to fix this or connect my own API?

support2 · March 10, 2026, 5:17am

Thank you for reaching out to Retell AI Support. We’ve received your ticket and our team will respond within 8 hours.

support2 · March 10, 2026, 5:18am

Hi Retell,

It sounds like you’re seeing 5–10s response delays with Claude Sonnet and want ways to reduce latency or connect your own API.

Regarding high latency with LLM responses:

Check your agent’s estimated latency and disable features marked with the turtle icon if it’s above ~1.5s to reduce overhead.
Compare LLM response time to the normal 500–900ms range. If it’s consistently higher, switch to a faster tier or an alternative provider for the model.
If calls are international, using a local number closer to your users can help.
Also check the Status Page for any ongoing incidents.
For details, see: Troubleshoot high latency.

To connect your own API, you have two options:

Custom LLM (LLM WebSocket): Retell connects to your server via llm_websocket_url, streams live transcript and inputs, and your server returns responses/actions in real time. This gives you full control over response generation and latency tuning. Custom LLM Overview and LLM WebSocket.
Custom Functions: Keep using Retell’s LLM but call your own endpoints (GET/POST/PUT/PATCH/DELETE) during the conversation to fetch data or run logic. Custom Function and Integrate any system with custom function.

To help investigate your current latency, could you share: 1) The affected call_id(s) 2) Whether this is on phone or web, and the callers’ region 3) Your current LLM model/tier selection 4) Any features you have enabled that show the turtle icon

If helpful, a screenshot of your agent’s latency estimate/config would also speed things up.

Did this address your question, or would you like guidance setting up the Custom LLM WebSocket or Custom Functions?

yusuf · March 10, 2026, 5:54am

call_8abf3943d5418a81468f7d608e7
call_f13846829f25ba68a0aef0e743e
call_4ec32934e56947685572eecbaf9
call_2d16fa87bd6b07f661baa9a3781
phone North America (NJ, USA)
Claude Sonnet 4.6
Speech Normalization and Safety Guardrails

support2 · March 10, 2026, 6:45am

Hi,

I reviewed the call IDs you provided. The latency analysis indicates that the LLM is the main cause of high latency. Claude Sonnet 4.6 shows a median response time of ~4.8s and up to ~8.7s at the 90th percentile. Transcription, TTS, and knowledge base latencies are within normal ranges.

This latency is expected for Claude Sonnet 4.6, as it is a heavier model. I recommend switching to a lighter, faster model like Claude 4.5 Haiku, GPT-4.1-mini, or GPT-4.1 to reduce response times significantly.

Best,
Stanley
Support @ Retell AI

Topic		Replies	Views
Latency issues inbound calls Support Help	6	95	January 14, 2026
Performance-related configurations and noticeable latency during live calls Support Help	1	59	February 4, 2026
Call latency increase Support Help agent-performance	6	68	February 13, 2026
Hi Retell Support, Support Help agent-performance	3	22	March 11, 2026
Voice agent very choppy today and unable to converse on the phone Support Help bug	16	48	March 17, 2026

LLM Latency Issue

Related topics