Critical Bug: Begin node not transitioning because of IVR system

Issue: Agent Not Pressing Digits Due to Late Node Transition

I have an agent that calls numerous large companies and navigates each company’s unique IVR system based on predetermined instructions (“press 1, then press 2”, etc.). Because the agent can’t speak in Press Digit nodes, I use a Logic Split node to handle both press-digit and voice-only IVR systems.

Current Flow Structure:

Begin Node
|
| (Setting: “User speaks first” - waits for IVR to finish speaking)
|
v
Logic Split Node
|
±- If {{ivr_variant}} = “custom_ivr”
| |
| v
| Custom IVR Instructions Node
|
±- If {{ivr_variant}} = “custom_press_digit”
| |
| v
| Custom Press Digit Node
| (Instructions: “Wait till you hear ‘prompt words’, then press one”)
|
±- Else
|
v
Default Press Digit Node

The Problem:

This flow works well for most companies, but I’ve discovered a critical bug: sometimes the agent transitions from Begin node to Press Digit node too late. By the time the Press Digit node is active, the IVR has already finished reading the menu options and times out without receiving any input, causing the call to disconnect.

Root Cause:

The Begin node has two modes: “Agent speaks first” or “User speaks first”

  • Agent speaks first: Not appropriate (my agent must listen silently, not speak)
  • User speaks first: Causes the agent to wait for the IVR to finish speaking before transitioning

Many IVR systems speak continuously without pauses, preventing early transition to downstream nodes.

Actual Call Timeline:

0:00 - BEGIN NODE (listening, waiting for IVR to finish)
0:44 - IVR says: “…for Foo, press 1…”
0:44 - TRANSITION TO LOGIC NODE
0:44 - TRANSITION TO PRESS DIGIT NODE
0:48 - IVR times out and disconnects

Result: The trigger phrase (“for Foo, press 1”) is spoken before the Press Digit node becomes active. When the node finally activates, the IVR has already timed out.

Question: How can I configure the Begin node to transition immediately to the Logic Split/Press Digit nodes without requiring the agent to speak, while still allowing the agent to listen for IVR trigger phrases?

Based on the documentation, I can identify the core issue and provide a solution.

The Core Problem

Your Begin node is configured with “User speaks first”, which causes the agent to wait for the IVR to finish speaking before transitioning.(1) Press Digit nodes evaluate whether to press a digit each time the user finishes speaking, and this timing is also affected by the detection delay setting.(1)

The critical issue is that transitions in Press Digit nodes occur only when the user is done speaking.(1)(2) This means your Begin node won’t transition until the IVR stops speaking, which may be too late for time-sensitive IVR menus.

Recommended Solution: Use Logic Split Node Immediately

The Logic Split node transitions immediately when the agent enters it.(3) The time spent in this node is minimal, and the agent does not speak.(3)

Restructure your flow like this:

Begin Node (User speaks first)
|
v
Logic Split Node (immediate evaluation)
|
+-- If {{ivr_variant}} = "custom_press_digit"
|     |
|     v
|   Press Digit Node
|   (with Detection Delay configured)
|
+-- Else
      |
      v
    Other handling nodes

Key Configuration Steps

1. Configure Detection Delay in Press Digit Node

Some IVR systems speak slowly. Set a delay on pauses to ensure the whole IVR menu is captured before the agent makes a decision.(1) The documentation recommends setting this to 1 second.(1)

2. Write Clear IVR Navigation Instructions

Provide clear instructions so the agent knows whether and what digit to press.(1) Include keywords or phrases to listen for, as well as which ones to avoid.(1)

Example prompt structure:

Your goal is to reach the scheduling or appointments department.
Preferred navigation keywords:
• Scheduling
• Appointments
• New patients
• Front desk
Avoid:
• Billing
• Referrals
• Medical records
• Clinical departments
If you are unsure which IVR option is correct:
Choose the option most closely related to scheduling or appointments.

(1)

3. Configure Transition Conditions

Write conditions like “Reached scheduling department” or “Menu repeated 3 times” to handle transitions.(1) Ensure you cover both successful navigation and potential failure scenarios or edge cases.(1)

Why This Works

The Logic Split node evaluates conditions immediately upon entry and branches to the corresponding destination nodes without the agent speaking.(3) This eliminates the delay caused by waiting for the IVR to finish speaking in the Begin node.

Once in the Press Digit node, the detection delay setting gives the IVR time to finish speaking before the agent evaluates what digit to press.(1)

Additional Considerations

If you’re still experiencing issues after implementing these changes, consider:

  • Adjusting the LLM temperature if responses are inconsistent(4)
  • Adding transition finetune examples to help the model understand your expectations(4)
  • Making node instructions more flexible to handle edge cases(4)

**

Your suggested setup is exactly what I already have set up. The issue is the call does not leave the BEGIN node! Even If I went BEGIN → PressDigit, the call is still in the begin mode because it lingers there, as we have no control over its transition.

Hello @codonnell
Can you share the call id that has this issue?

I’m an AI assistant knowledgeable about Retell AI. I don’t have access to specific user accounts or call histories.

However, if you’re experiencing an issue with a Retell AI call, you can retrieve call information using the call ID through the Retell API(1). The call object includes details like call_id, from_number, to_number, call_status, transcript, and other relevant information(1).

To retrieve a specific call, you can use:

import Retell from 'retell-sdk';
const client = new Retell({
 apiKey: 'YOUR_RETELL_API_KEY',
});
const callResponse = await client.call.retrieve('119c3f8e47135a29e65947eeb34cf12d');
console.log(callResponse);

(1)

If you need help with a specific issue, you can:

@Omar_Ashraf
call with issue: call_841c8a91bd93b29da10e1985306 I have more examples if necessary.
working call: call_0d7f68dcc879fa154014fb68575

@Omar_Ashraf hey were you able to review those calls, do you have any updates or suggestions?

@Omar_Ashraf We can’t scale up our calls until this is fixed, addressed or a workaround found. Is there anything we can do here? Can we at least get a response, please?

Hello @codonnell

I checked the call, and the transition happened almost instantly, but the tool didn’t run. This sometimes happens with LLMs like GPT-5 models

I would recommend using GPT-4.1 and trying again.

@Omar_Ashraf You’re telling me this call left the begin node instantly, why do the transcription logs say it left it at 0:44? Are the Transcription logs not correct?

Call id: call_841c8a91bd93b29da10e1985306

Node Transition
0:44
previous node: begin
new node: Logic Split Node

It is correct, let me explain:

The call started → user spoke for 44s → two transitions happened in less than 10ms

and the call stayed silent for about 3.5s (the tool call could be called in this time)

→ call ended

As I mentioned, this happens because sometimes LLMs like ChatGPT-5 are not consistent with tool calls.

Try GPT-4.1

image

@Omar_Ashraf Thank you, I will do some testing around this, and probably a good model per node refactor.

Thank you.

@Omar_Ashraf I tried all of the following models, all of them have the same issue: they do not leave the BEGIN node. Is there any otherway to get the model to leave the begin node? Like a transition checkbox or something?

Am I correct that the press digit node is not picking up on the user(aka IVR systems) options because the logic flow has not entered the press digit node?

Would turning the entire conversation into a “Fex Mode” from “Rigid Mode” work? That might break all kinds of other things? Some of the other IVR systems we call are working ok but a large portion are not, and have this issue.

GPT-4.1(fast): call_d6fd7850c27d7912056ba34f335
GPT-4.1: call_eeaf28f9c0787ecc0af7d7e26b2
GPT-5.2: call_a1cf595bc56037ade3a37742b8e

  • call_aaf4c654148250558e758c84b4d Same as above, I tried adding a “fine-tuned transition” to go to the logic node as soon as it hears the user(IVR) speak. I understand that is not the intent of this I wanted to try it.

Claude 4.6(sonnet): call_08ae11bcb9ed8ea28455555a7ce
GPT-4.1(nano): call_537dee5ba3d92eb9f7827653ab7
Gemini 3.0 Flash: call_0e949907d6ef8494169975f38e9

I’m a little lost here, on one hand, there seems to be something wrong with the BEGIN node, but maybe I’m missing something. Please advise.

Thank you