STT is horrible compared to Open AI or other STT providers

The STT mishears or does not transcribe at all. We added the same audio to Open AI and it was able to transcribe it perfectly. Here are 2 examples:

1st example:
Has your son been to Smiles Dental before, or would this be his first visit?

User

0:34

S’est first, was it? [Actual audio: It’s his first visit’]

2nd example:
Could you tell me if your son has been to our clinic before, or if this would be his first visit?

User

1:44

Is his first wizard. [Actual audio: First visit]

call id: call_900c3efe63005a22627e36ca312

agent id: agent_77aa5fa058ae9054976502cf7f

This has been happening in all calls. We already have it optimized for accuracy, background noise none, no denoising, added boosted words but none of that helps.

We added the same audio to Open IA and it was able to transcribe it perfectly.

Hey @ariannebrin1555

I’ve escalated the call ID to our team for further review.

We’ll update you as soon as we hear back.

Best regards

Hello @ariannebrin1555

The agent’s language setting is currently set to “multi” which tells Deepgram’s nova-3 model to listen for multiple languages simultaneously. This is what’s causing the problems:

  1. French misdetection: When the caller said “It’s his first visit,” the multilingual model interpreted it as French (“S’est first, was it?”), which then triggered your agent’s language-detection logic to switch to French — hence the agent suddenly responding in French

  2. “Inaudible speech” failures: The multilingual mode uses a 100ms endpointing timer, which is more aggressive than English-only mode. This caused two instances where Deepgram returned empty transcripts before the caller finished speaking.

  3. “Is his first wizard”: Another example of the multilingual model struggling — it misheard “visit” as “wizard.”

Team also noticed that your STT mode is currently set to “fast”, not “accuracy.” You mentioned you’ve already optimized for accuracy — could you double-check that the setting saved correctly on this agent? (Note: for multilingual mode, both fast and accuracy use the same 100ms endpointing, so this alone won’t solve the issue.)

Recommended fix:

If your callers are primarily English-speaking, switch the agent’s language from “multi” to “English”. This will:

  • Use the English-optimized Deepgram nova-3 model, which is significantly more accurate for English speech
  • Eliminate false French/Spanish language detection
  • Use tighter endpointing that reduces empty transcript issues

If you do need French support, consider creating a separate agent configured for French, or using the language detection with a higher threshold before switching.

Thank You

Hi,

Thanks for the input but I’m really struggling to get the agent to work as desired. We have spent over thousands of dollars so far but some or the other thing keeps failing. Can some expert help us in setting this up properly?