The agent’s response “I’m not seeing a matching appointment for that date right now.” started at ~1:38 and the speech track then stalled mid-utterance for roughly 48 seconds before the rest of the sentence (“Could you share the full date, and I’ll check again?”) played.
This agent has Background Sound set to Convention Hall, so while the speech track was stalled the ambient loop kept playing on its own — that’s the “continuous noise” you heard.
The model response itself was generated on time; the stall was downstream of the LLM, in the audio playback stage. We did not retrieve a provider-side error code for the stalled TTS stream, so the exact provider fault is not confirmed.
Recommended next steps on this agent (in Speech Settings):
Turn Background Sound off (or switch from Convention Hall to Coffee Shop / Call Center) — that alone would have made the gap register as silence instead of noise.
If you want to keep ambience, lower Background Sound Volume explicitly rather than leaving it at the default.
Re-run the same “no matching appointment” path a couple of times. If a mid-utterance stall reproduces, send the new call IDs and we’ll dig into the speech-pipeline trace for the stalled stream.
Thanks for testing and sending the second call. The two calls show different signatures, so the next step is to isolate which layer is producing the noise.
What we found on call_7176b8c5c3ebad1bf2ced8c5500:
The first agent turn played cleanly start to finish (no mid-utterance audio gap like the prior call).
No TTS, LLM, or telephony errors in the runtime log for the first 30 seconds.
The new ambient bed (Coffee Shop) is markedly louder and more recognisable than Convention Hall, so it can read as “noise” by itself. We have not isolated a TTS-side artifact on this call.
Two things will help us pin this down:
To audition the ambient sound by itself: there’s no in-product preview today. The cleanest test is to temporarily turn Background Sound off on this agent for a few calls — if the noise disappears with Background Sound off, it’s the ambient bed. (You can clone this agent into a test agent for that experiment so production stays untouched.)
To rule out the speech provider: swap the agent voice from “Marissa” to an ElevenLabs voice (e.g. Hailey, which is already configured as the fallback on this agent). Run a couple of calls. If the noise still occurs with Background Sound off AND a different voice, the artifact is upstream of both.
Hey @mark1 , thanks for the step-by-step breakdown. To run these tests on the cloned agent, do I need to provision and attach a phone number to it, or can I just use a web call via the dashboard/API to reproduce and test it?
You don’t need to attach a phone number — web calls from the dashboard (or the Web Call API/SDK) will reproduce the audio pipeline you’re testing. Background Sound mixing, voice selection, and the TTS path are all the same on web and phone calls, so toggling Background Sound off, or swapping Marissa → Hailey, will be audible on a web call the same way it would on a PSTN call.
One caveat to know:
Web calls use the full-bandwidth audio path (Opus over WebRTC). PSTN calls get downsampled to 8kHz G.711 μ-law, which can mask or alter some high-frequency artifacts. So for the two specific tests we proposed (does the ambient bed itself sound like the “noise”? does swapping the voice eliminate it?), web is sufficient — both are clearly audible on web. But if you ever hear something only on web and want to confirm it survives the PSTN downsample, that final confirmation does need a phone-number call.
Thanks for the clarification on the web calls vs. PSTN downsampling.
We’ve been running tests with the background ambient sound completely turned off to isolate the issue. However, we’ve encountered a strange new artifact during these runs.
I haven’t made any major changes to the setup, but the agent has started randomly laughing quite a few times during the conversation. Not sure if it’s directly related to removing the background sound or something else.
Hey @phonavar Thanks for the new call — this is unrelated to disabling Background Sound. With ambient off you’re now hearing a TTS-side artifact that the ambient bed was previously masking.
What we found on call_cf244ad989c53636f0d999d9df9:
The text the LLM produced at the 1:30 mark is clean — no “(laughs)”, emojis, or stage directions anywhere in the transcript (we checked every agent turn).
Your global prompt and flow nodes also have zero laughter / playful / warmth directives, so the model isn’t being instructed to do this.
The voice (retell-Marissa, routed to fish_audio) is a known less-stable voice on the underlying TTS, and at Voice Temperature 1.0 it occasionally introduces stochastic non-verbal prosody on certain passages — which is consistent with what’s being heard as a laugh. Your own v52 draft note flagging Marissa as unstable lines up with that.
Recommended path (you’ve already half-set this up):
Publish the v52 draft you already have — it swaps Marissa → Cimo, which is markedly more stable.
Lower Voice Temperature from 1.0 to 0.3–0.5 on that version — 0.3–0.5 is a better range for a clinical/professional tone and reduces stochastic prosody.
If it still happens after both changes, try a different voice family (e.g. an ElevenLabs voice) to isolate the TTS provider, and send us the new call_id + in-call timestamp so we can request a provider-side trace.
Regarding the recommendations, I am aware of temperature controls, but there seems to be a mismatch here. In my dashboard, the temperature setting right next to the LLM model is actually set to 0, not 1.0.
I will attach a screenshot of what my dashboard looks like shortly. If there is a separate, dedicated Voice Temperature control somewhere else that I am missing to handle the TTS provider side independently, please let me know where to find it!
In the meantime, I’ll be publishing the draft to switch over to Cimo and will test to see if that stabilizes the audio pipeline.
Hey @phonavar Sorry for the ambiguity in my last message. Those are two different controls:
LLM Temperature (the 0.00 next to the model picker) — controls how deterministic the LLM is. Lower = more deterministic. Yours is correctly set to 0.
Voice Temperature — lives in a separate popover: click the gear / settings icon next to the Voice (actor) selector at the top of the agent editor. Inside that popover you’d see Voice Speed and Voice Temperature (slider labeled Calm Emotional, default 1.0).
The catch I missed: for voices in our “platform” set — which includes both Marissa and Cimo — the Voice Temperature slider is intentionally hidden in the dashboard. So you didn’t miss anything, and switching to Cimo won’t expose it either.
That means for now your effective levers are:
Publish the v52 (Cimo) swap — do this first as planned. Cimo is the more stable voice in our experience, and the laugh-like artifact is the kind of thing voice choice alone typically resolves.
If Cimo still produces the artifact, switch the voice to a non-platform option, e.g. an ElevenLabs voice. On ElevenLabs voices the Voice Temperature slider IS exposed in that same popover and you can dial it down (0.3–0.5 is a good starting point).
If it’s still happening after both, send a new call_id + in-call timestamp and we’ll request a provider-side trace.