The STT mishears or does not transcribe at all. We added the same audio to Open AI and it was able to transcribe it perfectly. Here are 2 examples:
1st example:
Has your son been to Smiles Dental before, or would this be his first visit?
User
0:34
S’est first, was it? [Actual audio: It’s his first visit’]
2nd example:
Could you tell me if your son has been to our clinic before, or if this would be his first visit?
User
1:44
Is his first wizard. [Actual audio: First visit]
call id: call_900c3efe63005a22627e36ca312
agent id: agent_77aa5fa058ae9054976502cf7f
This has been happening in all calls. We already have it optimized for accuracy, background noise none, no denoising, added boosted words but none of that helps.
We added the same audio to Open IA and it was able to transcribe it perfectly.
The agent’s language setting is currently set to “multi” which tells Deepgram’s nova-3 model to listen for multiple languages simultaneously. This is what’s causing the problems:
French misdetection: When the caller said “It’s his first visit,” the multilingual model interpreted it as French (“S’est first, was it?”), which then triggered your agent’s language-detection logic to switch to French — hence the agent suddenly responding in French
“Inaudible speech” failures: The multilingual mode uses a 100ms endpointing timer, which is more aggressive than English-only mode. This caused two instances where Deepgram returned empty transcripts before the caller finished speaking.
“Is his first wizard”: Another example of the multilingual model struggling — it misheard “visit” as “wizard.”
Team also noticed that your STT mode is currently set to “fast”, not “accuracy.” You mentioned you’ve already optimized for accuracy — could you double-check that the setting saved correctly on this agent? (Note: for multilingual mode, both fast and accuracy use the same 100ms endpointing, so this alone won’t solve the issue.)
Recommended fix:
If your callers are primarily English-speaking, switch the agent’s language from “multi” to “English”. This will:
Use the English-optimized Deepgram nova-3 model, which is significantly more accurate for English speech
Eliminate false French/Spanish language detection
Use tighter endpointing that reduces empty transcript issues
If you do need French support, consider creating a separate agent configured for French, or using the language detection with a higher threshold before switching.
Thanks for the input but I’m really struggling to get the agent to work as desired. We have spent over thousands of dollars so far but some or the other thing keeps failing. Can some expert help us in setting this up properly?