Transcription Mode

The confusion you’re experiencing is common - the latency shown at the top is overall pipeline latency (ASR + LLM + TTS), not just the ASR setting you changed(1). Even if you adjust the transcription config, the displayed latency may not shift much unless the change directly impacts the full processing chain(1).

For improving transcription accuracy specifically with English, the documentation recommends using Boosted Keywords(2). You can add custom keywords to enhance the model’s vocabulary, with support for up to 100 custom keywords(2). This is particularly helpful for specific words or domain-specific terms that might be missing from transcripts(2).

Additionally, if transcription quality is impacted because sentences are being cut off (the transcription outputs the finalized sentence before it should), you can turn on transcription mode to be optimized for accuracy(2).

The endpointing setting you’re adjusting controls when the system decides the speaker has finished, but it primarily affects transcription quality rather than the overall latency metric displayed at the top(3). The displayed latency encompasses the entire conversation pipeline, so ASR endpointing changes alone won’t dramatically shift that number(3).

For further assistance with your specific configuration: