Rigid Mode Latency Issue - 5000ms vs Normal 1300ms in Flex Mode

cleven · January 20, 2026, 6:03pm

I’m experiencing abnormally high latency specifically in rigid mode (conversation flows with flex_mode: false). The same agents in flex mode work normally.

Observed Behavior:

Rigid mode: ~5000ms latency (unacceptable for voice conversations)
Flex mode: ~1300ms latency (normal)

Timeline:

Issue started: Not sure when, but the last time I made any adjustment was 10 Jan 2026, and it was working just fine.
Was this working before? Yes, rigid mode was working with normal latency previously

What I’ve Checked:

Retell status page shows all systems operational
No changes made to my flow structure today (only minor edge updates)
Multiple rigid mode agents affected, not just one
Flex mode agents on same account have normal latency

Impact:

Voice agents are unusable in production with 5-second response delays
Callers experience long awkward pauses between speaking and agent response
This is affecting live customer calls

Request:
Please investigate if there’s a platform-side issue affecting rigid mode conversation flow processing. The fact that flex mode works normally while rigid mode is 4x slower suggests this is not a configuration issue on my end.

AI-Evy · January 20, 2026, 6:03pm

Based on your description of rigid mode experiencing significantly higher latency (~5000ms) compared to flex mode (~1300ms), this appears to be a platform-side issue rather than a configuration problem on your end.

Relevant Context from Available Sources:

Flex Mode was introduced to allow agents to navigate flexibly between nodes, and you can control when your conversation agent switches between rigid and flexible behavior using components(1). The documentation indicates you can enable flex mode at the Agent or Component level, and it’s best for flows under 20 nodes(1).

There was a historical incident where Conversation Flow agents could potentially stay silent after a function node due to a message order issue, which was resolved(2). However, your issue appears different since it’s specifically affecting latency in rigid mode.

From community discussions, high latency issues have been reported with certain LLM models. One user experienced median latency over 3000ms and found that switching from GPT 5.1 to GPT 4.1 improved results, noting “GPT 5.1 isn’t good with latency at the moment”(3).

Recommended Actions:

Since your issue started after January 10, 2026 (10 days ago from Tuesday, January 20, 2026), and you haven’t made structural changes, I recommend:

Contact Retell AI support directly with specific call IDs showing the latency difference between rigid and flex mode
Check if your rigid mode agents are using a different LLM model than your flex mode agents
Consider temporarily using flex mode for production while the issue is investigated

Get Direct Support:

Visit Contact Sales
Ask questions on Discord
Visit https://github.com/RetellAI/

Provide them with:

Call IDs from both rigid and flex mode agents
Your agent configuration
The exact timestamp when you first noticed the degradation

Omar_Ashraf · January 20, 2026, 8:09pm

Hello @cleven,
There was a small issue with the node transition, and it has just been resolved. Please try again now.

cleven · January 20, 2026, 8:18pm

Got it, yep indeed resolved. Now would just like to find a way to prevent that from happening again.

AI-Evy · January 20, 2026, 8:18pm

Glad to hear it’s resolved! Here are strategies to prevent similar latency issues in the future:

Monitoring & Alerting:

Retell AI provides real-time analytics dashboards that surface sentiment, fallback spikes, and CSAT trends, letting ops teams iterate daily(1). Set up monitoring for:

Average latency metrics per agent
Comparison between rigid and flex mode performance
Call completion rates and quality metrics

Best Practices for Latency Optimization:

Production voice AI agents typically aim for 800ms or lower latency to maintain conversational flow(2). Retell AI achieves sub-500ms latency for human-like conversation flow(1).

Key factors affecting latency include:

LLM Selection: The choice of LLM significantly impacts response time. Some models have higher latency than others(2).
Transcription Mode: You can balance between transcription accuracy and latency by selecting the right mode(3):
- Optimize for speed: Uses latest interim results with low endpointing for downstream processing
- Optimize for accuracy: Uses results with higher endpointing setting, incurring ~200ms latency(3)

From benchmarking, both modes have similar WER (Word Error Rate), but the difference lies in capturing entities like numbers and dates. If your use case relies heavily on capturing these entities well, use optimize for accuracy mode. Otherwise, use optimize for speed mode for best latency(3).

Preventive Measures:

Test both modes: Regularly test your agents in both rigid and flex mode to catch performance differences early
Monitor the status page: Keep an eye on Retell’s status page for platform updates
Version control: Track when you make configuration changes so you can correlate issues with specific updates
Use flex mode strategically: Flex mode is best for flows under 20 nodes and allows agents to navigate flexibly between nodes(1)

Get Proactive Support:

Visit Contact Sales
Ask questions on Discord
Visit https://github.com/RetellAI/

Consider setting up automated testing that runs sample calls through your agents periodically to detect latency degradation before it impacts production customers.

Topic		Replies	Views
🚀 New Feature Release: Flex Mode Announcements	0	74	November 24, 2025
Latency issues inbound calls Support Help unsolved	6	45	January 14, 2026
Performance-related configurations and noticeable latency during live calls Support Help unsolved	1	10	February 4, 2026
Slow Latency (3000ms, triple what it should be) Support Help unsolved	3	27	January 26, 2026
Second Node Response Delayed (~8 Seconds) Support Help unsolved	2	7	January 20, 2026

Rigid Mode Latency Issue - 5000ms vs Normal 1300ms in Flex Mode

Related topics