Solution Found — DTMF Keypad Input Workaround (Detailed Steps for Others)

Original Problems

Problem 1: After caller pressed 10 digits, the system went silent and didn’t proceed until the caller spoke (e.g., said “Hello?”).

Problem 2: All 10 DTMF tones audibly played back to the caller before the agent responded.


Root Cause We Discovered

The Press Digit node is designed for the agent to navigate an IVR (press 1 for sales, etc.) — NOT for capturing user keypad input. Press Digit nodes wait for a speech turn boundary before advancing, which caused:

  1. The flow to hang silently after DTMF input

  2. Audio buffering that flushed when the caller finally spoke


The Solution: Remove Press Digit Nodes Entirely

Retell’s agent-level User DTMF settings already capture keypad input in ANY node. The digits are stored in {{user_dtmf_input}}. We don’t need dedicated Press Digit nodes.


Step-by-Step Implementation

Step 1: Enable Agent-Level DTMF Settings

In Agent Settings → Call Settings → User DTMF Options:

Setting Value
User Keypad Input Detection ON
Digit Limit 10
Timeout 4000ms
Termination Key OFF (optional)

Step 2: Remove Press Digit Nodes from Flow

Delete or disconnect any Press Digit nodes that were being used to “capture” user phone numbers. These nodes cause the hanging/buffering issue.


Step 3: Create a Branch Node to Check for DTMF Input

After the node that asks for the phone number, add a Branch node called CHECK DTMF INPUT.

Edge 1: DTMF_VALID

{{user_dtmf_input}} contains exactly 10 digits. Keypad input is complete.

→ Routes to: FORMAT DTMF NUMBER node

Edge 2: NO_DTMF (Else)

{{user_dtmf_input}} is empty, missing, or does not contain 10 digits. Caller likely spoke the number.

→ Routes to: Extract Variables node (for voice input)


Step 4: Create FORMAT DTMF NUMBER Node

Node Type: Extract Dynamic Variables

Variable Name: phone_number

Variable Description:

Take the DTMF input and format it as E.164.

The caller entered these digits via keypad: {{user_dtmf_input}}

Simply add +1 prefix to the digits.

OUTPUT: +1 followed by the 10 digits exactly as entered.

Example: 
- Input: 2896003518
- Output: +12896003518

Do not change, rearrange, or interpret the digits. Just add +1 prefix.

→ Routes to: Confirm Number node


Step 5: Update Validation Branch for E.164 Format

If you have a validation branch that checks the phone number, update the conditions to accept E.164 format:

VALID edge:

The extracted phone_number is in E.164 format (+1 followed by exactly 10 digits, e.g., +12896003518) OR is exactly 10 digits without prefix. Not UNKNOWN, not empty. Proceed to confirm.

Step 6: Keep Voice Path for Spoken Numbers

The existing Extract Variables node handles callers who speak their number instead of pressing digits. The Branch node routes voice input through this path automatically.


Final Flow Structure

ASK FOR PHONE NUMBER (Conversation node)
"What's the best number to reach you at? You can press the digits on your keypad, or say them one at a time — whichever is easier."
       ↓
CHECK DTMF INPUT (Branch node)
       ↓
  ├── DTMF_VALID (10 digits in {{user_dtmf_input}})
  │         ↓
  │   FORMAT DTMF NUMBER (Extract Variables — adds +1 prefix)
  │         ↓
  │   CONFIRM NUMBER
  │
  └── NO_DTMF (Else — voice input)
            ↓
      EXTRACT NUMBER (Extract Variables — parses spoken digits)
            ↓
      VALIDATE EXTRACTED NUMBER (Branch)
            ↓
      CONFIRM NUMBER

Why This Works

  1. Agent-level DTMF settings capture keypad input automatically — no Press Digit node needed

  2. Branch node checks {{user_dtmf_input}} and routes accordingly

  3. Flow advances immediately after 10 digits — no waiting for speech turn

  4. No audio buffering because there’s no “stuck” state

  5. Both input methods supported — keypad and voice work seamlessly


Results

  • Caller presses 10 digits → flow advances immediately → agent confirms number

  • No more silent hanging

  • No more DTMF tone playback

  • Smooth UX for both keypad and voice input


I hope this helps other users who encounter the same issue. The key insight is that Press Digit nodes are for agent IVR navigation, not user input capture.

Thanks for your support in helping us troubleshoot this!


Your analysis is spot-on. The Press Digit node is designed for agents to navigate IVRs, not to capture user keypad input(1). Your solution correctly leverages Retell AI’s built-in DTMF capture capabilities.

Confirming Your Approach

Agent-Level DTMF Settings
You’re right to configure DTMF at the agent level. Retell AI captures user keypad input automatically when you enable it in Call Settings(1)(2). The key settings are:

  • Digit Limit (user_dtmf_options.digit_limit): Maximum number of digits (set to 10 for phone numbers)(1)
  • Termination Key (user_dtmf_options.termination_key): Optional key like # or * to signal completion(1)
  • Timeout (user_dtmf_options.timeout_ms): Time to wait after last digit before timing out(1)

Once configured, DTMF input is captured and added to the call transcript automatically(1). The agent can reference this value when generating responses(1).

Extracting DTMF as a Variable
Your approach of using an Extract Dynamic Variable node to capture {{user_dtmf_input}} is the recommended method(3). After the caller enters digits, you can use the Extract Dynamic Variable tool to store that input (for example, an account number or phone number)(3). Once saved, you can reuse it anywhere in the conversation using {{variable_name}}(3).

The Extract Dynamic Variable node supports multiple variable types(4):

  • Text – Any word or sentence
  • Number – A numeric value
  • Enum – A value from a predefined list
  • Boolean – True or false

Branching Logic
Your use of a Logic Split Node (or Branch node) to check whether {{user_dtmf_input}} contains exactly 10 digits is correct(5). The Logic Split Node evaluates conditions immediately when the agent enters it and branches to the corresponding destination(5). This allows you to route DTMF input to one path and voice input to another without the agent speaking(5).

Why Press Digit Nodes Caused Issues

Press Digit nodes are intended for agent-side IVR navigation (e.g., “press 1 for sales”)(1), not for capturing user input. Using them for user input capture creates the exact problems you described:

  1. Silent hanging after DTMF input (waiting for a speech turn boundary)
  2. Audio buffering that only flushes when the caller speaks

Your Solution is Correct

Your final flow structure is exactly how Retell AI is designed to handle this:

  1. Ask for phone number in a Conversation node
  2. Check DTMF input in a Logic Split Node using {{user_dtmf_input}}(5)
  3. Route DTMF input to an Extract Dynamic Variable node to format it(4)
  4. Route voice input to a separate Extract Dynamic Variable node to parse spoken digits(4)

This approach ensures:

  • Flow advances immediately after 10 digits are entered(1)
  • No silent hanging or audio buffering
  • Both keypad and voice input work seamlessly(3)

Your implementation is a great example of using Retell AI’s native DTMF capabilities correctly. Thanks for sharing this detailed walkthrough—it will help others avoid the same pitfall.

1 Like