Agent says transitions calls aloud and speaks Chinese instead of transitioning

Instead of transitioning to the appropriate node, the agent will say aloud things like these:

  • {“tool_uses”:[{“recipient_name”:“functions.transition_to_gather_info”,“parameters”:{}}]}
  • functions.transition_to_gather_info 大发彩票快三commentary to=functions.transition_to_gather_info 天天送彩票json{} 大发彩票网

We have experienced the error with GPT 5.4.

Hey @rodrigo.christiansen can you share your Agent ID and the Call ID ? where you faced this issue.

Thank You

@shaw Here’s a sample of calls with the first issue:

  • call_abdd66c3af9f8d4c96447c98671
  • call_dc1320fb8c707894e2510935793
  • call_4195d1a578137befa837cfa637a
  • call_f1411d76a6b90d88d99c5c0bc00
  • call_e3df6df3400fda624ac336a3ff7
  • call_d918c211ef71ee0ad8949ee531c
  • call_1bf08b709f151ec2d8d2657b092

And another set with the second issue:

  • call_99bc4283d49e935097c0b675dbb
  • call_c1abe58a72261c1d2e6829e8056
  • call_6d9673467182aa5c15c411ba302
  • call_a9bd7d2e8b09db4f43ba73bc346
  • call_acafeee81ebbbb1ccf2b73a78fd
  • call_29c88136b067677bc100c11f31d
  • call_3bfa30166c19c68de6b32dcd49e

Hey @rodrigo.christiansen I have escalated your issue to the team.

Hey @rodrigo.christiansen both symptoms (the JSONcommentary to=functions.* tokens and the CJK bursts) look like the same underlying failure mode.

What we found on the sampled calls (e.g. call_abdd66c3af9f8d4c96447c98671, call_99bc4283d49e935097c0b675dbb):

  • They ran on gpt-5.4 via OpenAI’s Responses API. The strings being spoken aloud {"tool_uses":[{"recipient_name":"functions.transition_to_gather_info"...}]}, commentary to=functions.transition_to_gather_info) are OpenAI’s internal tool-call channel markers — they should land in the structured tool-call output, but the model is occasionally emitting them as plain assistant text, so TTS reads them out.
  • The transition_to_<state> names match how our stateful-LLM engine auto-names transitions between your states, so that part is expected — the bug is that the model is verbalizing the protocol token instead of invoking the tool.
  • The Chinese-character bursts (大发彩票 / 天天送彩票) co-occur in the same off-distribution turns and appear to come from the model as well — none of those strings exist anywhere in our code, and your agent is configured en-US with 11labs-Kate, so there’s no path inside Retell that could introduce them. We’re attributing them to the provider by elimination; we have not yet extracted the raw model payload for one of those turns to quote verbatim.

Recommended next steps (workarounds while we don’t have a fix for the model behavior itself):

  1. Move the agent off gpt-5.4. You mentioned gpt-4.1 reproduces it too, so I’d suggest trying a different family — e.g. gpt-4o or claude-sonnet-4 — rather than another gpt-5.x variant. We have not specifically reproduced or cleared those alternatives against this exact failure mode, so please share a few new call IDs after the swap and we’ll confirm.
  2. Optionally add a guard near the top of the general prompt: “Respond only in English. Never output the strings functions., to=functions., {"tool_uses", or any non-Latin characters.” This doesn’t fix the underlying model behavior, but is worth trying as a mitigation.
  3. The agent we inspected is currently running off its draft version (isPublished=false). Make sure any model change is published so production traffic picks it up.

We’ll also look internally at whether we can detect and strip these Harmony channel markers server-side before TTS for gpt-5.x sessions.

Thank You