Nvidia's networking business is becoming a second AI infrastructure giant
Nvidia’s GPU business still gets most of the attention. Fair enough. The AI boom runs on its accelerators. But the company’s networking division is now big enough to reshape the AI infrastructure market. In the last quarter alone, Nvidia reported $11...
Nvidia’s networking business is becoming too big to ignore
Nvidia’s GPU business still gets most of the attention. Fair enough. The AI boom runs on its accelerators.
But the company’s networking division is now big enough to reshape the AI infrastructure market. In the last quarter alone, Nvidia reported $11 billion in networking revenue. For the full year, it brought in more than $31 billion. That’s the scale of a major infrastructure company sitting inside a GPU company.
The timing matters. At GTC in mid-March, Nvidia added more pieces to its Rubin AI platform, including updated interconnect, Ethernet photonics, and a new Inference Context Memory Storage system. The direction is obvious. Nvidia wants control of the fabric inside the AI data center, not just the chips plugged into it.
That strategy has been visible since the $7 billion Mellanox acquisition in 2020. The difference now is revenue. Networking has moved well past support-business status.
Why networking matters this much
Large AI systems live or die on communication overhead.
Once training scales across racks, the bottleneck stops being just matrix math on the GPU. It becomes all_reduce, all_gather, reduce_scatter, and the rest of the synchronization work needed to keep thousands of accelerators moving together. If the interconnect adds tail latency, expensive GPUs sit around waiting for stragglers.
Nvidia likes to say the data center is the computer and the network is the backplane. It’s a marketing line, but it also happens to be true.
Inside a single server or pod, NVLink and NVSwitch tie GPUs together with high bandwidth and predictable latency. Across servers, Nvidia is pushing two paths:
- InfiniBand, still the default for tightly coupled training at very large scale
- Spectrum-X Ethernet, aimed at buyers who want AI-tuned Ethernet instead of a separate InfiniBand island
Then there’s the software that decides whether any of this works well: NCCL, UCX, GPUDirect, RDMA transport, congestion control, collective offload. This is where clusters either run cleanly or waste a lot of money.
That full-stack control is the point. Nvidia is selling a tuned system.
What developers should care about
A lot of networking coverage gets lost in product categories. That misses the part that matters.
The real question is whether a training or inference cluster can hold high utilization under actual workloads, not polished benchmarks with clean traffic patterns. AI traffic is ugly. Collective operations come in bursts. Tensor parallelism and pipeline parallelism stress links in different ways. Inference has its own problems, especially once context windows get large and key-value cache movement starts eating bandwidth.
Nvidia’s stack goes after that from several angles.
NVLink and the local fabric
Within a node, NVLink is there to make multiple GPUs behave more like a single large memory and compute domain. That matters for model sharding and collectives. If frameworks can move tensors between GPUs quickly and consistently, less time gets burned on staging data and waiting on synchronization.
NCCL sits on top and chooses transports based on where the traffic has to go. That sounds dull until you’ve spent days chasing performance cliffs caused by bad collective routing. In multi-GPU training, transport decisions in software often matter as much as the link speeds in the spec sheet.
InfiniBand still owns the top end
For massive training clusters, InfiniBand remains the safer choice. RDMA cuts CPU overhead. In-network compute and collective offload reduce host pressure and lower latency for the communication patterns large models care about most.
That edge often shows up in tail behavior rather than raw peak bandwidth. The cluster that wins is usually the one with fewer ugly spikes at the 99.9th percentile during all_reduce, not the one with prettier headline numbers.
That’s why InfiniBand keeps hanging on despite years of predictions that Ethernet would swallow it. Ethernet is cheaper, familiar, and often good enough. For giant tightly coupled jobs, "good enough" can still mean lost throughput and ugly scaling curves.
Spectrum-X is Nvidia’s serious Ethernet push
Nvidia also knows that plenty of enterprises and cloud operators don’t want an InfiniBand-centric operation. Their teams know Ethernet. Their tooling, procurement, and operational habits are built around it.
That’s where Spectrum-X comes in. The pitch is AI-tuned Ethernet with the software and telemetry needed to make RoCE v2 work at scale without turning the network into a science project.
That matters because RoCE always comes with fine print. You have to manage loss, queue behavior, and congestion carefully. PFC, ECN, and algorithms like DCQCN can deliver strong performance, but they also create plenty of ways to end up with deadlock, head-of-line blocking, or fragile tuning if the network team doesn’t know the failure modes.
Nvidia is trying to package away that complexity. Some of it can be hidden. Some of it can’t. If you’re buying Ethernet for AI because it seems simpler, be careful. It’s only simpler if the tuning holds up under your traffic.
Photonics and power are now front-and-center
One of the more interesting pieces in Nvidia’s recent announcements is the focus on co-packaged optics and Spectrum-X Ethernet Photonics.
That sounds niche until you look at modern AI pod design. Dense clusters are running into power delivery and cooling limits almost as fast as they run into compute demand. Every electrical hop burns energy. Every extra bit of heat tightens the design envelope.
Moving optical components closer to the switch ASIC reduces SerDes reach and can cut power per bit while improving bandwidth density. In dense AI deployments, optics and switch thermals can decide whether a pod scales cleanly or gets boxed in by power and cooling constraints.
This shift still doesn’t get enough attention. Networking is taking a bigger share of capex, and a bigger share of the physical design budget too. For some balanced AI builds, networking can approach a quarter of total capital cost. Five years ago, many enterprise buyers would’ve treated that as absurd.
Now it’s normal.
Nvidia is also chasing inference bottlenecks
The new Inference Context Memory Storage platform is worth watching, even if the name is awkward.
Inference at scale has a memory traffic problem. Large-context models spend a lot of time rebuilding or moving prompt state and key-value caches. If more of that state stays close to the fabric and can be fetched efficiently, GPU stalls drop and east-west network churn eases up.
That doesn’t solve inference economics by itself. Memory hierarchy still matters. Fast storage still matters. Scheduler behavior still matters. But it does show where Nvidia sees the next constraint: not just compute, but the movement and placement of model context across the cluster.
For AI engineers building retrieval-heavy systems, multi-turn assistants, or serving stacks with large KV caches, that’s a more useful signal than another vague claim about faster inference.
The business impact goes beyond the revenue number
At the obvious level, Nvidia has built a multibillion-dollar networking division.
More important, it can now pressure the rest of the market from both sides. It sells the compute and the fabric, then ties them together with software and reference designs that cut integration pain.
That creates pressure for:
- Cisco, Arista, Broadcom, and HPE, all of which want a serious share of the AI network buildout
- Ethernet standardization efforts like the Ultra Ethernet Consortium, which are trying to make open Ethernet fabrics better suited to AI and HPC traffic
- Cloud and colo operators that would rather keep infrastructure modular and avoid deeper dependence on one vendor’s roadmap
Nvidia’s integrated approach is appealing because it gets clusters working faster. That matters. AI infrastructure projects fail in very ordinary ways: firmware mismatches, bad congestion tuning, flaky collective performance, weak observability, and endless vendor finger-pointing.
A single-stack design cuts down some of those failure modes. It also tightens lock-in.
If you buy Nvidia compute, Nvidia interconnect, Nvidia switches, Nvidia software plumbing, and Nvidia reference architecture, deployment gets easier. Your negotiating position gets worse.
What technical buyers should do with this
If you’re sizing an AI cluster in 2026, the first question is still workload mix, not switch speed.
If you run giant distributed training jobs with heavy collectives, InfiniBand still deserves a serious look. If the workload is more inference-heavy, mid-scale, or tied closely to existing enterprise networking operations, Spectrum-X Ethernet may be the more practical choice.
Either way, a few rules hold:
- Test with traffic that looks like your actual
NCCLpatterns, not polished vendor benchmarks - Watch tail latency, not just average throughput
- Treat firmware, drivers,
UCX, andNCCLversion alignment as part of performance engineering - Budget for optics, power, and cooling early, because those limits show up faster than many teams expect
- Assume vendor integration will make life easier, then decide how much flexibility you’re willing to give up
Nvidia’s networking unit matters because AI clusters are no longer just servers plus switches. The fabric now shapes training efficiency, inference economics, and even rack design.
That’s why this business is growing so fast. In AI infrastructure, control of the connections is getting close to as valuable as control of the chips.
What to watch
The harder part is not the headline capacity number. It is whether the economics, supply chain, power availability, and operational reliability hold up once teams try to use this at production scale. Buyers should treat the announcement as a signal of direction, not proof that cost, latency, or availability problems are solved.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Build the data and cloud foundations that AI workloads need to run reliably.
How pipeline modernization cut reporting delays by 63%.
Microsoft has signed a large capacity deal with Nscale, the AI cloud and infrastructure company founded in 2024, to deploy about 200,000 Nvidia GB300-class GPUs across four sites in the US and Europe. The topline is huge. The site list is what gives ...
Microsoft just made a pointed infrastructure announcement. Satya Nadella says the company has deployed its first production Nvidia “AI factory” inside Azure, with more coming across Microsoft’s global data center footprint. The numbers are big enough...
Nvidia reported $46.7 billion in revenue for the quarter, up 56% year over year. $41.1 billion came from data center. Net income reached $26.4 billion. The number that stands out for infrastructure teams is $27 billion of data center revenue from Bla...