Hi Retell team,
I wanted to share a very specific request around voice naturalness and emotional expressiveness for realtime agents.
I’ve been testing agents from ElevenLabs and also the OpenAI voice assistant, and the biggest difference (you notice it within seconds) is the level of human-like expressions and emotion: natural reactions, subtle shifts in intention, a “smile in the voice,” moments of empathy/assurance/surprise, and those small expressive nuances that make it feel like you’re speaking to a real person rather than a clean TTS read.
In Retell, even when using ElevenLabs Turbo/Flash v2.5, the voice is solid and usable, but it still sounds noticeably more “flat” compared to experiences like Eleven v3 or OpenAI’s realtime voice. For sales calls, this matters a lot — it reduces the “are you a robot?” reaction and increases trust and engagement throughout the call.
My request/questions:
-
Do you plan to support a more emotionally expressive voice model/stack in Retell (e.g., something equivalent to Eleven v3 once it’s available for realtime), or provide advanced options that enable more expressions and emotional variation in the voice output?
-
Is there any beta, timeline, or recommended setup to get closer to that level of expressiveness today?
If helpful, I can share short side-by-side audio comparisons (Retell vs ElevenLabs/OpenAI) showing exactly what I mean.
Thanks for all!
Best regardts