Rigid Mode Latency Issue - 5000ms vs Normal 1300ms in Flex Mode

I’m experiencing abnormally high latency specifically in rigid mode (conversation flows with flex_mode: false). The same agents in flex mode work normally.

Observed Behavior:

  • Rigid mode: ~5000ms latency (unacceptable for voice conversations)

  • Flex mode: ~1300ms latency (normal)

Timeline:

  • Issue started: Not sure when, but the last time I made any adjustment was 10 Jan 2026, and it was working just fine.

  • Was this working before? Yes, rigid mode was working with normal latency previously

What I’ve Checked:

  • Retell status page shows all systems operational

  • No changes made to my flow structure today (only minor edge updates)

  • Multiple rigid mode agents affected, not just one

  • Flex mode agents on same account have normal latency

Impact:

  • Voice agents are unusable in production with 5-second response delays

  • Callers experience long awkward pauses between speaking and agent response

  • This is affecting live customer calls

Request:
Please investigate if there’s a platform-side issue affecting rigid mode conversation flow processing. The fact that flex mode works normally while rigid mode is 4x slower suggests this is not a configuration issue on my end.

Based on your description of rigid mode experiencing significantly higher latency (~5000ms) compared to flex mode (~1300ms), this appears to be a platform-side issue rather than a configuration problem on your end.

Relevant Context from Available Sources:

Flex Mode was introduced to allow agents to navigate flexibly between nodes, and you can control when your conversation agent switches between rigid and flexible behavior using components(1). The documentation indicates you can enable flex mode at the Agent or Component level, and it’s best for flows under 20 nodes(1).

There was a historical incident where Conversation Flow agents could potentially stay silent after a function node due to a message order issue, which was resolved(2). However, your issue appears different since it’s specifically affecting latency in rigid mode.

From community discussions, high latency issues have been reported with certain LLM models. One user experienced median latency over 3000ms and found that switching from GPT 5.1 to GPT 4.1 improved results, noting “GPT 5.1 isn’t good with latency at the moment”(3).

Recommended Actions:

Since your issue started after January 10, 2026 (10 days ago from Tuesday, January 20, 2026), and you haven’t made structural changes, I recommend:

  1. Contact Retell AI support directly with specific call IDs showing the latency difference between rigid and flex mode
  2. Check if your rigid mode agents are using a different LLM model than your flex mode agents
  3. Consider temporarily using flex mode for production while the issue is investigated

Get Direct Support:

Provide them with:

  • Call IDs from both rigid and flex mode agents
  • Your agent configuration
  • The exact timestamp when you first noticed the degradation

Hello @cleven,
There was a small issue with the node transition, and it has just been resolved. Please try again now.

Got it, yep indeed resolved. Now would just like to find a way to prevent that from happening again.

Glad to hear it’s resolved! Here are strategies to prevent similar latency issues in the future:

Monitoring & Alerting:

Retell AI provides real-time analytics dashboards that surface sentiment, fallback spikes, and CSAT trends, letting ops teams iterate daily(1). Set up monitoring for:

  • Average latency metrics per agent
  • Comparison between rigid and flex mode performance
  • Call completion rates and quality metrics

Best Practices for Latency Optimization:

Production voice AI agents typically aim for 800ms or lower latency to maintain conversational flow(2). Retell AI achieves sub-500ms latency for human-like conversation flow(1).

Key factors affecting latency include:

  1. LLM Selection: The choice of LLM significantly impacts response time. Some models have higher latency than others(2).

  2. Transcription Mode: You can balance between transcription accuracy and latency by selecting the right mode(3):

    • Optimize for speed: Uses latest interim results with low endpointing for downstream processing
    • Optimize for accuracy: Uses results with higher endpointing setting, incurring ~200ms latency(3)

From benchmarking, both modes have similar WER (Word Error Rate), but the difference lies in capturing entities like numbers and dates. If your use case relies heavily on capturing these entities well, use optimize for accuracy mode. Otherwise, use optimize for speed mode for best latency(3).

Preventive Measures:

  1. Test both modes: Regularly test your agents in both rigid and flex mode to catch performance differences early
  2. Monitor the status page: Keep an eye on Retell’s status page for platform updates
  3. Version control: Track when you make configuration changes so you can correlate issues with specific updates
  4. Use flex mode strategically: Flex mode is best for flows under 20 nodes and allows agents to navigate flexibly between nodes(1)

Get Proactive Support:

Consider setting up automated testing that runs sample calls through your agents periodically to detect latency degradation before it impacts production customers.