Bug: Tool-call instruction template is read aloud as part of the agent's speech (Now you are invoking function tool ...)

andrew3 · April 29, 2026, 9:22pm

Severity: High — caller hears English system-prompt text in the middle of a Japanese conversation, including the literal tool ID.

Summary

When an agent invokes a custom tool with speak_during_execution: true, the instruction template that tells the LLM what filler to say is being concatenated into the agent’s spoken output instead of being consumed. The English system text
plays out loud through TTS, sometimes twice in the same turn.

Reproduction

Agent ID: agent_2c260cc96dc5143125a5664790
Conversation flow: conversation_flow_963b78c1fdbd (single custom tool: rag_lookup_<supplier_id>)
Voice: Cartesia (Aiko)
LLM: Retell-managed (GPT-5.1)
Language: multi
speak_during_execution: true on the tool node
Call type in our reproduction: web call
Caller utterances: Japanese
Frequency: at least 2 of ~8 test calls in one test session

Reference calls

Example 1 — short leak

call_69cf7c1215dfefe7eb4005e9ab1
https://dashboard.retellai.com/call-history?history=call_69cf7c1215dfefe7eb4005e9ab1

Excerpt of what the caller heard (one continuous agent utterance):

▎ お子様が０歳で、プールをご利用いただけるかどうかでございますね。こちらを確認いたしますので、少々お待ちくださいませ。
▎ Now you are invoking function.
▎ お子様が０歳でプールをご利用いただけるかについてでございますね。こちらを確認いたしますので、少々お待ちくださいませ。

The English string Now you are invoking function. is spoken aloud, sandwiched between two paraphrases of the same Japanese filler line.

Example 2 — full instruction-template leak

call_7dec8e72f3a888115a1aaef8ac6
https://dashboard.retellai.com/call-history?history=call_7dec8e72f3a888115a1aaef8ac6

Excerpt:

▎ レストランでの喫煙についてでございますね。館内のルールをお調べいたしますので、少々お待ちくださいませ。
▎ Now you are invoking function tool rag_lookup_b28fdb84-b7e8-5f97-bab4-324df5d6e43f, and you should come up with what to say to user during execution given this instruction: Acknowledge the caller’s question by briefly paraphrasing what
▎ they asked about, then let them know you’re looking it up. Keep it natural and conversational. Example: “Let me check on [topic] for you, one moment please.”
▎ レストランでの喫煙についてでございますね。館内のルールをお調べいたしますので、少々お待ちくださいませ。
▎ Now you are invoking function tool rag_lookup_b28fdb84-b7e8-5f97-bab4-324df5d6e43f, and you should come up with what to say to user during execution given this instruction: …

In this call the leak is the complete speak_during_execution instruction template — including the literal tool ID and the prompt’s own example sentence (“Let me check on [topic] for you, one moment please.”). It plays through twice in the
same utterance.

What we’d expect

The speak_during_execution instruction is a system-level directive telling the LLM what filler to produce; only the LLM’s response to that instruction (the filler itself) should reach the TTS. The instruction text and the tool ID should
never be spoken.

What we’d like investigated

Why is the speak_during_execution instruction text being forwarded to TTS along with the LLM’s filler response?
Is this consistent across LLM versions (GPT-5.1 vs. GPT-4.1)? We have a hunch this regressed with GPT-5.1.
Is there a way to suppress the instruction echo — e.g. a flag on the tool, or a known-good prompt template — until the upstream fix lands?

mark1 · April 29, 2026, 9:25pm

Hey @andrew3

Thank you for the details. I’ve forwarded them to our team for review.

We’ll get back to you as soon as we have an update.

Best regards

mark1 · April 30, 2026, 1:03pm

Hey @andrew3

GPT‑5.1 is meaningfully more prone to echoing the system directive verbatim than GPT‑4.1, especially in non-English conversations, so we believe your hunch is correct.

Team checked the logs for both of these calls, and they appear to be the LLM hallucination leaking the prompts rather than a system bug. We’d recommend going back to GPT-4.1 for now.

Team can see that our prompt for tool calls is a bit unclear, so they will work on updating them. The fix should be done by today.

Thank You

andrew3 · April 30, 2026, 6:47pm

Thanks for the quick update! We’ll use 4.1 for production use and keep trying 5.1 to see if your fix improves things.

Topic		Replies	Views
LLM speaks JSON/tool payload aloud before "Talk While Waiting" activates Support Help agent-performance	1	18	April 16, 2026
Tool-call syntax spoken aloud instead of silently invoking transfer_call/end_call Support Help bug	10	30	June 29, 2026
Agent reading voicemail prompt instructions verbatim instead of following them Support Help bug	2	20	March 18, 2026
Updated speaking during tool calls Feature Requests	0	20	March 11, 2026
Severe Agent Bug Support Help bug	5	45	March 21, 2026

Bug: Tool-call instruction template is read aloud as part of the agent's speech (Now you are invoking function tool ...)

Related topics