An open-source, low-latency audio and video server designed for real-time communications and voice AI.

Who led the $100M Series C round?

Index Ventures led the round, with participation from Altimeter, Hanabi Capital, and Redpoint.

Why choose an SFU over an MCU for voice AI?

An SFU forwards media streams directly without central mixing, cutting down extra hops and reducing latency and cost.

Artificial Intelligence January 25, 2026

LiveKit raises $100M at a $1B valuation as voice AI infrastructure demand grows

LiveKit has raised $100 million in Series C funding at a $1 billion valuation, with Index Ventures leading and Altimeter, Hanabi Capital, and Redpoint participating. The round comes just 10 months after its previous raise. The valuation matters, but ...

LiveKit’s $1 billion valuation says voice AI infrastructure is now a real market

LiveKit has raised $100 million in Series C funding at a $1 billion valuation, with Index Ventures leading and Altimeter, Hanabi Capital, and Redpoint participating. The round comes just 10 months after its previous raise.

The valuation matters, but the more useful signal is where LiveKit sits. It handles the real-time media layer that most AI companies don’t want to build themselves: voice and video systems that need low latency, solid reliability, and decent behavior on bad networks.

That includes ChatGPT’s voice mode, which runs on LiveKit. The company also works with xAI, Salesforce, Tesla, plus customers with less tolerance for failure, including 911 operators and mental health providers. That list says a lot. Voice AI has moved beyond novelty apps and demo bots. It’s showing up in systems where latency, uptime, and privacy problems actually matter.

Why the price looks high, and still plausible

Anyone who has built voice AI knows the model is only part of the experience. The rest comes down to transport, packet loss, jitter, NAT traversal, interrupt handling, and the timing glitches that make a conversation feel either smooth or awkward.

That’s LiveKit’s territory.

The company started in 2021 as an open source low-latency audio and video server, at a time when the pandemic made real-time communications infra look like a solid business. The original problem was general RTC: stable voice and video, low lag, decent developer ergonomics. Then AI arrived, and the same plumbing turned out to fit real-time model interfaces pretty well.

Voice AI is harsher than video calls in a few specific ways:

Turn-taking has to feel immediate.
Users interrupt constantly.
Partial speech recognition matters more than final transcripts.
The system has to start speaking before the model has fully finished.
Silence matters. Overlap does too.

A lot of teams still underestimate that. They build voice as text chat with audio bolted on. The result usually works on paper and feels bad in practice.

Media systems, not model wrappers

LiveKit’s value is in the media plane, especially around WebRTC, SFU routing, and the plumbing needed to connect clients to AI backends without wasting latency.

A simple version of the flow looks like this:

A client captures microphone audio and sends it over WebRTC, usually encoded with opus.
LiveKit’s infrastructure routes that stream through an SFU or related media services.
The audio is mirrored into AI services for streaming ASR, safety checks, wake-word detection, or other analysis.
The model generates partial responses.
A low-latency TTS system turns those into audio chunks and sends them back quickly enough to feel conversational.

One key design choice is SFU over MCU.

An MCU decodes and mixes media centrally. That adds cost and delay. An SFU, or Selective Forwarding Unit, mostly forwards packets where they need to go without mixing everything first. For AI voice, that’s usually the better trade-off. A user’s audio may need to hit several services at once: ASR, moderation, logging, maybe a specialized intent classifier. Efficient forwarding saves time and money.

That can sound like infrastructure trivia. It isn’t. When you’re trying to hit a conversational latency target, every decode, re-encode, and extra server hop shows up in the product.

The 250 ms problem

For voice AI, the rough target is 250 milliseconds mouth-to-ear. Miss it too often and the interaction starts to drag, even if transcript quality is excellent.

The budget disappears fast:

Audio capture and opus encoding: around 10 to 30 ms
Network round trip on decent connections: 30 to 80 ms
Streaming ASR first partials: 30 to 80 ms
TTS first audio chunk: 20 to 60 ms
Playback buffer and jitter handling: 30 to 60 ms

That leaves very little room for the model itself.

So teams use a few practical tricks. They stream partial intent. They generate short leading phrases. They do incremental synthesis. They keep TTS chunk sizes small, around 40 to 80 ms, so interruption still feels responsive. They add VAD and DTX so silence doesn’t burn bandwidth and tokens. They tune jitter buffers for interactivity instead of smooth, safe playback.

That’s the engineering behind every voice assistant demo that feels natural.

The hard part is keeping duplex audio, interruption handling, and safety checks inside a latency budget humans won’t notice.

LiveKit has become valuable because it takes a chunk of that work off teams that would rather spend time on inference, business logic, or product UX.

Why OpenAI uses it

OpenAI using LiveKit for ChatGPT voice is a useful signal.

People tend to assume frontier model companies want to own every layer. Often they don’t. Running a global real-time media network is its own discipline, with its own reliability problems, cost structure, and operational mess. The same goes for TURN/STUN infrastructure, regional routing, recording, and endless debugging around corporate firewalls and bad mobile networks.

For a model provider, outsourcing that layer can be the sensible call. It keeps engineering effort focused on inference, safety, and product behavior instead of packet delivery and NAT traversal.

That also says something about the market. Voice AI infrastructure is starting to break out from generic communications platforms. Its requirements don’t look much like old video API businesses.

Twilio’s exit from Programmable Video already pointed in that direction. Broad RTC tooling lost some of its appeal. Narrower products with better economics or tighter product fit looked stronger. AI gave RTC a fresh use case with actual spending behind it, and LiveKit has benefited from that shift.

What developers should watch

If you’re building real-time voice apps, the funding news matters less than the architecture underneath it.

A few things deserve more attention than they usually get.

Barge-in is table stakes

Users interrupt constantly. If your stack can’t stop TTS playback the moment new speech arrives, the app feels clumsy.

That means client-side VAD, fast cancellation paths, and short synthesis chunks. Long audio buffers sound polished until somebody tries to talk over them.

`TURN` still breaks production systems

Voice demos usually run on clean office networks. Real traffic doesn’t. Hotel Wi-Fi, locked-down enterprise networks, bad carrier routing, weird NAT behavior. If connectivity is sloppy, none of the model work matters.

Managed infrastructure helps, but if you’re self-hosting or running hybrid deployments, redundant turn:// pools and region-aware routing are basic requirements.

Audio quality has real trade-offs

For upstream ASR, opus at 16 kHz is often good enough and cheaper. For downstream TTS, 24 kHz usually sounds better without blowing the latency budget. 48 kHz can help in richer audio scenarios, but it costs time and bandwidth.

A lot of teams overspec audio quality and then act surprised when responsiveness gets worse.

Observability has to cover network and model

If users complain about lag, app logs won’t tell you much. Track RTT, packet loss, jitter, start-to-first-token, and start-to-first-audio. Correlate model latency with media metrics. Plenty of teams blame the LLM when the real issue is a jitter buffer that’s too deep or a TURN relay on the wrong continent.

Privacy gets harder once voice is the interface

LiveKit’s presence in mental health and 911 settings points to the next phase of voice AI infrastructure: compliance-heavy deployments. That means E2EE options, possibly with WebRTC insertable streams, stronger audit trails, data residency controls, and careful decisions about where model inference runs.

This part is still messy. End-to-end encryption and server-side AI processing pull in opposite directions unless the trust boundary is tightly controlled.

The market signal behind the round

A billion-dollar valuation for an infrastructure company doesn’t prove durable value on its own. Plenty of AI plumbing startups are being priced for perfect execution. LiveKit has a better case than most.

It has open source roots, real production usage, a clear strength in low-latency media handling, and demand from voice interfaces across consumer apps, contact centers, healthcare, and automotive systems.

There’s still risk. Real-time infrastructure can get commoditized. Large cloud vendors could push harder. Customers may prefer fewer vendors. Some model providers will pull parts of this stack in-house over time, especially if voice becomes a primary interface and costs keep rising.

Still, the demand is real. If voice agents are going to be common, somebody has to run the media layer well. It’s hard to fake, annoying to maintain, and easy to screw up.

LiveKit’s valuation looks like a bet that this pain point will stick around long enough to support a real company. That seems reasonable. And for teams building voice AI now, the reminder is simple: a lot of what users hear is decided by infrastructure they never see.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Web and mobile app development

Build AI-backed products and internal tools around clear product and delivery constraints.

Related proof

Growth analytics platform

How analytics infrastructure reduced decision lag across teams.

How AI startup architecture is changing, according to January Ventures

Jennifer Neundorfer, managing partner at January Ventures, is set to speak at TechCrunch All Stage on July 15 at Boston’s SoWa Power Station about how AI is changing startup construction. The useful part of that argument isn’t the familiar point abou...

xAI’s $20 billion fundraise points to a new ceiling for AI valuations

xAI Holdings is reportedly trying to raise up to $20 billion at a valuation above $120 billion. If it gets there, it would be the second-largest private funding round on record, behind OpenAI’s $40 billion round. It’s a huge number. It also fits the ...

TechCrunch Disrupt 2025 puts AI infrastructure and applications on one stage

TechCrunch Disrupt 2025 is putting two parts of the AI market next to each other, and the pairing makes sense. One is Greenfield Partners with its “AI Disruptors 60” list, a snapshot of startups across AI infrastructure, applications, and go-to-marke...