I’m experiencing abnormally high latency specifically in rigid mode (conversation flows with flex_mode: false). The same agents in flex mode work normally.
Observed Behavior:
Rigid mode: ~5000ms latency (unacceptable for voice conversations)
Flex mode: ~1300ms latency (normal)
Timeline:
Issue started: Not sure when, but the last time I made any adjustment was 10 Jan 2026, and it was working just fine.
Was this working before? Yes, rigid mode was working with normal latency previously
What I’ve Checked:
Retell status page shows all systems operational
No changes made to my flow structure today (only minor edge updates)
Multiple rigid mode agents affected, not just one
Flex mode agents on same account have normal latency
Impact:
Voice agents are unusable in production with 5-second response delays
Callers experience long awkward pauses between speaking and agent response
This is affecting live customer calls
Request:
Please investigate if there’s a platform-side issue affecting rigid mode conversation flow processing. The fact that flex mode works normally while rigid mode is 4x slower suggests this is not a configuration issue on my end.
Based on your description of rigid mode experiencing significantly higher latency (~5000ms) compared to flex mode (~1300ms), this appears to be a platform-side issue rather than a configuration problem on your end.
Relevant Context from Available Sources:
Flex Mode was introduced to allow agents to navigate flexibly between nodes, and you can control when your conversation agent switches between rigid and flexible behavior using components(1). The documentation indicates you can enable flex mode at the Agent or Component level, and it’s best for flows under 20 nodes(1).
There was a historical incident where Conversation Flow agents could potentially stay silent after a function node due to a message order issue, which was resolved(2). However, your issue appears different since it’s specifically affecting latency in rigid mode.
From community discussions, high latency issues have been reported with certain LLM models. One user experienced median latency over 3000ms and found that switching from GPT 5.1 to GPT 4.1 improved results, noting “GPT 5.1 isn’t good with latency at the moment”(3).
Recommended Actions:
Since your issue started after January 10, 2026 (10 days ago from Tuesday, January 20, 2026), and you haven’t made structural changes, I recommend:
Contact Retell AI support directly with specific call IDs showing the latency difference between rigid and flex mode
Check if your rigid mode agents are using a different LLM model than your flex mode agents
Consider temporarily using flex mode for production while the issue is investigated
Glad to hear it’s resolved! Here are strategies to prevent similar latency issues in the future:
Monitoring & Alerting:
Retell AI provides real-time analytics dashboards that surface sentiment, fallback spikes, and CSAT trends, letting ops teams iterate daily(1). Set up monitoring for:
Average latency metrics per agent
Comparison between rigid and flex mode performance
Call completion rates and quality metrics
Best Practices for Latency Optimization:
Production voice AI agents typically aim for 800ms or lower latency to maintain conversational flow(2). Retell AI achieves sub-500ms latency for human-like conversation flow(1).
Key factors affecting latency include:
LLM Selection: The choice of LLM significantly impacts response time. Some models have higher latency than others(2).
Transcription Mode: You can balance between transcription accuracy and latency by selecting the right mode(3):
Optimize for speed: Uses latest interim results with low endpointing for downstream processing
Optimize for accuracy: Uses results with higher endpointing setting, incurring ~200ms latency(3)
From benchmarking, both modes have similar WER (Word Error Rate), but the difference lies in capturing entities like numbers and dates. If your use case relies heavily on capturing these entities well, use optimize for accuracy mode. Otherwise, use optimize for speed mode for best latency(3).
Preventive Measures:
Test both modes: Regularly test your agents in both rigid and flex mode to catch performance differences early
Monitor the status page: Keep an eye on Retell’s status page for platform updates
Version control: Track when you make configuration changes so you can correlate issues with specific updates
Use flex mode strategically: Flex mode is best for flows under 20 nodes and allows agents to navigate flexibly between nodes(1)
Consider setting up automated testing that runs sample calls through your agents periodically to detect latency degradation before it impacts production customers.