Voice inconsistency - stuttering and filler sounds (aaaa, ehhh, hmmm)?

Hey everyone,

I’m building a customer service agent in Swedish and I’m experiencing some really inconsistent voice quality. Sometimes the agent sounds great and natural, but other times it stutters or makes weird filler sounds like “aaaa”, “ehhh”, “hmmm”, “ja…” mid-sentence.

It’s especially noticeable when:

  • Answering questions about products

  • Reading out phone numbers or addresses

  • Transitioning between topics

My setup:

  • Language: Swedish

  • Voice: 11labs

  • Multi-state system with knowledge base

What I’ve tried:

  • Cleaning up my prompts

  • Being more explicit about pronunciation

  • Testing different times of day

Has anyone else experienced this? Is this a known issue with Swedish voices, or am I doing something wrong in my prompts?

Any tips would be super appreciated!

Thanks!

From Thursday, February 05, 2026 (2026-02-05), I can provide guidance on voice quality issues with Retell AI:

Based on the available documentation, there are several voice-related parameters you can adjust to improve consistency and reduce unwanted filler sounds:

Voice Configuration Parameters

You can control voice behavior through several settings in your agent configuration(1)(2):

Voice Temperature: Controls voice stability. Lower values (closer to 0) produce more stable speech, while higher values (up to 2) create more variant speech generation(1)(2). For your stuttering issue, try reducing voice_temperature to a lower value like 0.5 or 0.7(1)(2).

Voice Speed: Controls the speech rate with values ranging from 0.5 to 2(1)(2). Adjusting this might help with transitions and filler sounds(1)(2).

Responsiveness: Controls how responsive the agent is, with values from 0 to 1(1)(2). Lower values mean the agent waits more and responds slower, which might reduce rushed or stuttered speech(1)(2).

Enable Backchannel: This controls whether the agent interjects with phrases to show engagement(1)(2). If enabled, you can adjust backchannel_frequency (0 to 1) and customize backchannel_words(1)(2). Consider disabling this or reducing the frequency if these sound like your filler issues(1)(2).

Text Normalization

Enable normalize_for_speech to convert numbers, currency, and dates to their spoken form before audio generation(1)(2). This is particularly helpful for phone numbers and addresses(1)(2).

Pronunciation Dictionary

You can add a pronunciation dictionary to guide consistent pronunciation of specific Swedish words or phrases(1)(2):

"pronunciation_dictionary": [
  {
    "word": "specific_swedish_word",
    "alphabet": "ipa",
    "phoneme": "pronunciation_guide"
  }
]

(1)

Additional Considerations

The documentation mentions that interaction problems can include issues where the system interrupts users or continues talking when interrupted(3). Turn-taking models affect conversation flow(3).

For Swedish specifically, I don’t see explicit guidance in the available sources about known issues with Swedish voices. However, there was a recent community report about Polish STT not catching words correctly, with a suggestion to adjust voice response speed if the agent talks too fast(4).

I recommend:

  1. Lower your voice_temperature to increase stability(1)(2)
  2. Enable normalize_for_speech for numbers and addresses(1)(2)
  3. Adjust responsiveness to allow more deliberate speech(1)(2)
  4. Review your backchannel settings(1)(2)

If these adjustments don’t resolve the issue, you may want to reach out through the official support channels: