How to get a desired call duration to control costs

Hi, we are trying to control our call duration to control costs so the agent wraps up the call after a certain amount of time.

Example 7-10 minutes. We are trying to use prompting but it does not seem to be consistent. Is there a way to do this? Any help would be very appreciated!

Thanks,

James

You can use the max_call_duration_ms parameter to set a hard limit on call duration(1)(2)(3). The minimum value allowed is 60,000 ms (1 min), and the maximum value allowed is 7,200,000 ms (2 hours)(1)(2)(3). By default, this is set to 3,600,000 ms (1 hour)(1)(2)(3).

For your 7-10 minute target, you would set this to 420,000 ms (7 minutes) or 600,000 ms (10 minutes)(1)(2).

When creating or updating an agent, include:

{
  "max_call_duration_ms": 600000
}

(1)(2)

This will force end the call when the duration is reached(1)(2)(3).

Important notes:

  • After updating this setting, you need to publish the agent version on your phone number for the change to take effect(4)
  • The max_call_duration_ms setting results in an abrupt hangup by default(5)
  • Some users have reported that prompting the LLM to gracefully end calls before the time limit (e.g., “The call should not exceed 10 minutes…gracefully summarize…and conclude the call efficiently by invoking the end_call function”) can help, though this approach may not be entirely reliable(5)

The hard limit via max_call_duration_ms is your most reliable option for controlling costs, though it won’t provide a graceful wrap-up(1)(2)(3)(5).

The hard limit cuts off a call or conversation abruptly which we don’t want. Is there another way besides Max call duration please?

Based on the available sources, there isn’t a built-in Retell AI feature that provides a graceful call wrap-up before a time limit besides max_call_duration_ms, which does result in an abrupt hangup(1).

However, community members have explored workarounds:

Prompt-based approach:
Some users have tried adding time awareness to the prompt, such as: “The call should not exceed 10 minutes. If the conversation approaches the 10 minute mark, gracefully summarize the key points, thank the lead for their time, and conclude the call efficiently by invoking the end_call function”(1).

One user reported this seemed to work, though another user couldn’t replicate the behavior and suspected it might be a hallucination(1). This approach appears inconsistent, which aligns with what you’re experiencing.

Proposed webhook approach:
One community suggestion was to use a webhook that monitors call duration externally and triggers a state transition to an end_call state when the timer expires(1). However, the same user noted that “there is no listening webhook feature” that would enable this(1).

Unfortunately, the official documentation doesn’t provide a native solution for gracefully ending calls based on duration. The max_call_duration_ms parameter remains the only documented method for enforcing time limits(2)(3), but it does force end the call abruptly(2)(3).

If you’d like, I could try to provide a general answer about potential approaches, though they wouldn’t be officially documented Retell AI features. Would that be helpful?

Yes please that would be good

Hello @MMaggie,

The {{session_duration}} variable returns the duration of the current call. You can prompt the agent before each response to check whether the duration exceeds 10 minutes. If it does, the agent should inform the user that the call must end, and then invoke the end_call function.

Hi Omar, thanks for the response. When the agent checks the timing, does this cause any delay or is she vocal that they are checking a timer? We also need to schedule the next call before ending the call. So if she checks the timer and it is for example at 10 minutes, can we then put in our wrap up to schedule the next call?

Hi Omar,

I have the same desire as MMaggie: to control costs, I would like the agent to gracefully end the call after a certain duration (e.g. 10 minutes).

Based on your suggestion, I inserted the following into my LLM prompt:

## Conversation Flow
1. **Greeting:**
   - ...
2. **Main Chat:**
   - ...
3. **Closing:**
    - If the Current Call Duration ({{session_duration}}) is more than 10 minutes, concisely say you need to go, thank them, and use the end_call function to end the call.

This has been live 1 day and does not seem to be working at all. I have had multiple calls today reach the max duration of 15 minutes, prompting an abrupt hang-up. The agent has also decided to end calls shorter than 10 minutes.

In summary, it doesn’t seem like the LLM is able to make use of session_duration at all. Any ideas?

Hello @Buckets_Blake

This will mainly depend on your agent prompting

Can you share some call ids so I can check them?

Hi Omar, here are some call ids from today:

  • call_7cb04b003f4f3dc3319a5463759 - 15 minute call where the agent did not initiated end_call
  • call_28653da7286c3624177b5c58a05 - 15 minute call where the agent did not initiated end_call
  • call_8f342e57851858b967dd81d5f82- 6 minute call in which the agent initiated end_call saying “since our time is almost up”

From these examples, it seems like the LLM is not actually receiving session_duration or for some reason is unable to make use of it.

Hi Omar, I realized I forgot to tag you in my above response. I would appreciate your help getting the agent to actually use session_duration or any other ideas. Thank you!

The issue here is with your prompting

You need to prompt it something like this

## Response Guidelines

- Before every response, check if {{session_duration}} is greater than 30 seconds.
- If {{session_duration}} > 30 seconds:
  - Say: "The call has gone on for too long, I have to go now."
  - Immediately call the function: end_call
  

But keep in mind this totally depends on the LLM to call the function, so this will not be 100% everytime, and also it depends on the LLM. I find that GPT 4.1 performs better in these cases.

The approach you’ve outlined is correct for using dynamic variables and conditional logic in your prompts to trigger function calls.(1)(2)

When you include conditional instructions in your prompt that reference dynamic variables like {{session_duration}}, the LLM will evaluate those conditions and decide whether to call the end_call function.(1)(2)

However, you’re absolutely right that function calling reliability depends on the LLM model you choose.(1)(3)

Regarding GPT-4.1 performance: The GPT-4.1 family is available on Retell AI with three tiers: GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano.(4)(5) GPT-4.1 offers exceptional reasoning capabilities and sophisticated understanding of nuance and context.(4)

For function calling specifically, Retell AI recommends using a lower temperature value (closer to 0) to improve reliability.(3)(5) Temperature controls randomness - values between 0.0-0.3 are highly consistent and deterministic, making them best for function calling and data collection.(3)

There is some community discussion suggesting GPT-4o may still perform better for function calling consistency in certain scenarios.(6) One user reported reverting back to GPT-4o after experiencing issues with GPT-4.1 function calling.(6)

To optimize function calling reliability, consider:(3)

  • Setting temperature between 0.1-0.3 for function calling use cases
  • Enabling Structured Output (when available for your model) to ensure responses follow your defined schemas(3)
  • Using clear, explicit conditional logic in your prompts

Thanks Omar, I will give it a try with prompting in that format and with a different LLM.

Do you know if session_duration is returned in seconds? The dynamic variables docs provide an example of “20 minutes 30 seconds”, but I’ve actually seen my LLM tell people “It’s been 600 seconds”, so I’m thinking that the docs are wrong and session_duration is always in seconds.

Based on the documentation, session_duration should return a human-readable format like “20 minutes 30 seconds”, not raw seconds(1).

The documentation explicitly shows this example format for the session_duration variable(1). If your LLM is receiving “600 seconds” instead, this could indicate:

  1. The LLM is converting the human-readable format to seconds on its own
  2. There’s a discrepancy between what the documentation states and what’s actually being passed

I’d recommend checking the actual value being passed to your LLM by logging the raw dynamic variables before they reach your prompt. You can verify this by reviewing the call logs in your dashboard(2) or examining the retell_llm_dynamic_variables field in your call data(3).

If you’re consistently seeing numeric seconds rather than the documented format, you may want to reach out through the official support channels to clarify this behavior.