LLM speaks JSON/tool payload aloud before "Talk While Waiting" activates

bill · April 16, 2026, 1:16pm

We’re experiencing a race condition where the LLM outputs raw function call JSON as spoken text to the caller before “Talk While Waiting” intercepts.

Setup:

Custom LLM with tool calls (book_appointment, cancel_booking, end_call)
“Talk While Waiting” enabled on all tools with natural filler phrases
“Payload: args only” enabled on affected tools

What happens:
When the LLM decides to call a tool (especially cancel_booking), it starts generating tokens immediately — including the JSON payload — before Retell’s Talk While Waiting takes over. The caller hears something like:

"{"booking_day":"Friday","notes":"Caller wants to move existing appointment to Monday afternoon","execution_message":"Cancel the Friday appointment"} Let me take care of that for you."

The filler phrase fires after the JSON has already been spoken.

What we’ve tried:

Talk While Waiting with natural filler phrases (works ~70% of the time)
Payload: args only ON
Extensive prompt engineering (“never speak JSON”, “tools are silent internal actions”, “if about to output structured data, output nothing instead”)
Dedicated tool safety contract prepended to prompt with examples of good/bad behavior
Removed a tool (extract_lead_details) that had no Talk While Waiting option, which was causing parallel tool call JSON leaks

Observations:

Happens most often on cancel_booking and when the LLM switches intent mid-thought (e.g., mentally generating a cancel payload before calling book_appointment)
Does NOT happen on simple book_appointment calls where the intent is straightforward
The execution_message field (which the LLM invents — it’s not in our schema) is frequently spoken aloud
Parallel tool calls made the problem worse (removed that tool as a workaround)

Questions:

Is there a way to suppress LLM token output during the tool call generation phase?
Can “Talk While Waiting” be made to intercept before any LLM tokens are streamed to TTS?
Is there a setting to enforce that spoken output and tool invocation are treated as separate output channels?

Any guidance appreciated. This is the last blocker preventing our voice agent from passing QA consistently.

shaw · April 16, 2026, 3:02pm

Hi @bill

This is a server-side code issue in your Custom LLM integration, not a Retell platform limitation.

In your WebSocket response handler, when the LLM streams back tool call tokens, your code is sending those tokens as content in the response event to Retell — which then speaks them aloud. Tool call arguments and text content come through different delta fields in the OpenAI streaming response, and your code isn’t properly separating them.

You must check delta.toolCalls vs delta.content separately:

When delta.toolCalls has data → accumulate arguments silently (never send to Retell as content)
When delta.content has data → stream it to Retell as spoken content
Only after the tool call is fully assembled, send the message parameter (or your filler phrase) as content

“For OpenAI, it would either give a tool call, or it would give a text response, but not both.” If you’re seeing JSON in spoken output, your code is likely misrouting tool call argument deltas into the content field of the Retell response event.

Regarding “Talk While Waiting”: That feature applies to Retell’s built-in agent frameworks (Single Prompt / Conversation Flow), not Custom LLM WebSocket integrations where you control what gets sent as spoken content.

Recommendations:

Audit your streaming loop to ensure tool call deltas never get sent as content
Lower temperature for more deterministic tool calling behavior
Consider using Retell’s built-in agent frameworks (which handle this separation natively) instead of Custom LLM

Thank You

Topic		Replies	Views
Agent intermittently exposes raw tool call JSON during live calls Support Help bug	8	19	May 20, 2026
Updated speaking during tool calls Feature Requests	0	16	March 11, 2026
Bug: Tool-call instruction template is read aloud as part of the agent's speech (Now you are invoking function tool ...) Support Help bug	3	23	April 30, 2026
Tool calling - AI agent doesn't reply back. Stops interacting Support Help	3	124	January 4, 2026
Conversation agent suddenly starts talking in gibberish Support Help agent-performance	13	80	April 21, 2026

LLM speaks JSON/tool payload aloud before "Talk While Waiting" activates

Related topics