LLM speaks JSON/tool payload aloud before "Talk While Waiting" activates

We’re experiencing a race condition where the LLM outputs raw function call JSON as spoken text to the caller before “Talk While Waiting” intercepts.

Setup:

  • Custom LLM with tool calls (book_appointment, cancel_booking, end_call)

  • “Talk While Waiting” enabled on all tools with natural filler phrases

  • “Payload: args only” enabled on affected tools

What happens:
When the LLM decides to call a tool (especially cancel_booking), it starts generating tokens immediately — including the JSON payload — before Retell’s Talk While Waiting takes over. The caller hears something like:

"{"booking_day":"Friday","notes":"Caller wants to move existing appointment to Monday afternoon","execution_message":"Cancel the Friday appointment"} Let me take care of that for you."

The filler phrase fires after the JSON has already been spoken.

What we’ve tried:

  • Talk While Waiting with natural filler phrases (works ~70% of the time)

  • Payload: args only ON

  • Extensive prompt engineering (“never speak JSON”, “tools are silent internal actions”, “if about to output structured data, output nothing instead”)

  • Dedicated tool safety contract prepended to prompt with examples of good/bad behavior

  • Removed a tool (extract_lead_details) that had no Talk While Waiting option, which was causing parallel tool call JSON leaks

Observations:

  • Happens most often on cancel_booking and when the LLM switches intent mid-thought (e.g., mentally generating a cancel payload before calling book_appointment)

  • Does NOT happen on simple book_appointment calls where the intent is straightforward

  • The execution_message field (which the LLM invents — it’s not in our schema) is frequently spoken aloud

  • Parallel tool calls made the problem worse (removed that tool as a workaround)

Questions:

  1. Is there a way to suppress LLM token output during the tool call generation phase?

  2. Can “Talk While Waiting” be made to intercept before any LLM tokens are streamed to TTS?

  3. Is there a setting to enforce that spoken output and tool invocation are treated as separate output channels?

Any guidance appreciated. This is the last blocker preventing our voice agent from passing QA consistently.

Hi @bill

This is a server-side code issue in your Custom LLM integration, not a Retell platform limitation.

In your WebSocket response handler, when the LLM streams back tool call tokens, your code is sending those tokens as content in the response event to Retell — which then speaks them aloud. Tool call arguments and text content come through different delta fields in the OpenAI streaming response, and your code isn’t properly separating them.

You must check delta.toolCalls vs delta.content separately:

  • When delta.toolCalls has data → accumulate arguments silently (never send to Retell as content)

  • When delta.content has data → stream it to Retell as spoken content

  • Only after the tool call is fully assembled, send the message parameter (or your filler phrase) as content

“For OpenAI, it would either give a tool call, or it would give a text response, but not both.” If you’re seeing JSON in spoken output, your code is likely misrouting tool call argument deltas into the content field of the Retell response event.

Regarding “Talk While Waiting”: That feature applies to Retell’s built-in agent frameworks (Single Prompt / Conversation Flow), not Custom LLM WebSocket integrations where you control what gets sent as spoken content.

Recommendations:

  1. Audit your streaming loop to ensure tool call deltas never get sent as content

  2. Lower temperature for more deterministic tool calling behavior

  3. Consider using Retell’s built-in agent frameworks (which handle this separation natively) instead of Custom LLM

Thank You