NVIDIA infrastructure upgrade for speed.

Jeff_Scott_Ward · February 5, 2026, 7:03pm

NVIDIA just released a new open source transcription model, Nemotron Speech ASR, designed from the ground up for low-latency use cases like voice agents.

Here’s a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms.

This agent actually uses *three* NVIDIA open source models:

Nemotron Speech ASR
Nemotron 3 Nano 30GB in a 4-bit quant (released in December)
A preview checkpoint of the upcoming Magpie text-to-speech model

These models are all truly open source: weights, training data, training code, and inference code. This is a big deal! Jensen said in the CES keynote yesterday that he expects open source models to catch up to proprietary models this year in a number of categories. NVIDIA is putting their weight behind making this happen. (As Alan Kay said, the best way to predict the future is to invent it.)

The code for this agent is open source too, of course. You can deploy it to production with

@modal

and

@pipecat_ai

cloud, or run locally on an

@nvidia

DGX Spark or RTX 5090.

Topic		Replies	Views
Nvidia personaplex-7b-v1 integration Feature Requests	0	118	January 23, 2026
Request: More Emotionally Expressive Realtime Voices (Eleven v3 / OpenAI-level) Feature Requests	0	77	February 15, 2026
Retell often mishearing conversations Support Help agent-performance	2	11	April 10, 2026
Create an ARMY of Voice Agents in SECONDS (Free N8N Template) Share Your Videos	0	65	October 3, 2025
Gemini 3.0 model Support Help	1	92	November 22, 2025

NVIDIA infrastructure upgrade for speed.

Related topics