GPT-5.2 (and GPT-4.1) never calls custom tools — what are we missing?

Subject: GPT-5.2 (and GPT-4.1) never calls custom tools — what are we missing?

We have a Single-Prompt agent with 9 tools (mix of custom, transfer_call, and end_call types). The agent converses fine but never calls any custom tools — zero
check_customer_status, zero check_service_area, zero save_contact_info. The only tool it occasionally calls is end_call.

Tested on both gpt-4.1 and gpt-5.2 — same behavior on both.

Config:

  • Response engine: Retell LLM (Single-Prompt)
  • start_speaker: agent
  • begin_message: not set (LLM generates greeting)
  • tool_call_strict_mode: false
  • model_temperature: 0.08
  • Knowledge base attached (top_k: 3, filter_score: 0.6)
  • Custom tools point to our webhook URL
  • Tools have execution_message_description set
  • speak_during_execution: true and speak_after_execution: true on all custom tools

What we see:

  • Dynamic variables inject correctly
  • Agent follows the prompt conversationally
  • Agent never attempts any custom tool calls
  • end_call (type end_call) works
  • transfer_to_office (type transfer_call) has never been attempted — agent calls end_call instead when trying to “transfer”

What we’ve tried:

  • Switched models (gpt-4.1 → gpt-5.2)
  • Disabled tool_call_strict_mode
  • Added explicit “YOU MUST call check_customer_status” instructions at the top of the prompt
  • Prompt is ~22K chars with 9 tools defined

Questions:

  1. Is there a config setting that enables/disables custom tool execution that we might be missing?
  2. Does the knowledge base interfere with tool calling?
  3. Does execution_message_description on tools affect whether the model decides to call them?
  4. Is 22K chars too long for the prompt — does tool calling degrade with prompt length?
  5. Has anyone successfully gotten gpt-5.2 to proactively call custom tools in a Single-Prompt agent?

Hi @david3

Based on your setup, the core issue appears to be the use of a Single-Prompt agent with a very large prompt (~22K characters) and multiple tools (9 total). Single-prompt agents are known to have unreliable tool calling behavior when the prompt becomes too large or when too many functions are included. Generally, issues can arise when exceeding ~5 tools or ~1000 words, and your current configuration is significantly above that range.

At the moment, there isn’t any hidden setting that disables tool execution if tools are defined, they are available. However, factors like a large prompt and an active knowledge base can compete for the model’s attention, making tool invocation less consistent. Additionally, fields like execution_message_description only control messaging and do not influence whether a tool is called.

We suggest switching to a Multi-Prompt Agent or a Conversation Flow Agent.

As a best practice, you should also:

  • Reduce overall prompt size where possible

  • Add clear trigger instructions that reference tools by exact name

  • Define specific conditions for when each tool should be used

This should significantly improve tool-calling reliability and overall agent performance.

If still you face any issue, Reach us here

Thank You