Vera is Nvidia’s CPU designed for agentic AI workloads, including the infrastructure tasks around tool calls, memory, scheduling, I/O, and token handling.

Why is Nvidia calling this a $200 billion opportunity?

Nvidia argues that AI agents will create a large new infrastructure market because much of their runtime depends on CPU-side orchestration, not just GPU model computation.

Who does Vera compete with?

Vera pushes Nvidia further into territory served by Intel, AMD, and cloud providers’ custom chips such as AWS Graviton, Microsoft Cobalt, and other in-house silicon.

Artificial intelligence May 22, 2026

Nvidia points to a new $200B AI infrastructure market

Jensen Huang has a new number for Wall Street: $200 billion. On Nvidia’s latest earnings call, after the company reported $81.6 billion in quarterly revenue and guided for $91 billion next quarter, Huang said Nvidia has found a “brand new...

Nvidia’s Vera CPU pitch is about owning the agent runtime, not just selling another chip

Jensen Huang has a new number for Wall Street: $200 billion.

On Nvidia’s latest earnings call, after the company reported $81.6 billion in quarterly revenue and guided for $91 billion next quarter, Huang said Nvidia has found a “brand new” $200 billion total addressable market through Vera, its new CPU for agentic AI workloads.

That’s a large claim, even by Nvidia standards. It also has a clear logic behind it.

Nvidia already dominates the GPU side of modern AI training and inference. Vera is Huang’s argument that the next bottleneck will also sit in the CPU-heavy machinery around AI agents: tool calls, memory, scheduling, I/O, policy checks, and token handling across systems that behave less like one-off chatbot requests and more like fleets of semi-autonomous software workers.

Huang says Nvidia has already sold $20 billion worth of standalone Vera CPUs this year. If that number holds up, the pitch is landing before the product category has fully settled.

Why Nvidia wants the CPU layer

Nvidia introduced Vera in March as a CPU “purpose-built for agentic AI.” It can be sold on its own or paired with Nvidia’s upcoming Rubin GPU platform.

The positioning matters. Nvidia’s AI business is built around GPUs, networking, CUDA, and tightly integrated server platforms. CPUs have mostly belonged to Intel, AMD, and, increasingly, the cloud providers. AWS has Graviton for general compute and has been pushing its own AI silicon. Google has TPUs. Microsoft has Maia and Cobalt. Meta buys from everyone and is designing some of its own hardware too.

The threat is straightforward: if hyperscalers can cut enough cost from AI infrastructure with custom chips, they can reduce their dependence on Nvidia over time. Even if Nvidia keeps the high-end GPU crown, cloud providers don’t want one supplier taking margin from every layer of the AI stack.

Vera pushes Nvidia deeper into the general compute layer around AI systems, especially orchestration-heavy workloads that don’t map cleanly onto GPUs.

Huang’s technical argument is simple enough: GPUs handle much of the model computation, while agents spend a lot of time doing CPU-side work.

That work includes:

calling tools and APIs
managing long-running sessions
retrieving and updating context
running policy checks
executing code or browser tasks
coordinating multiple model calls
serializing and streaming tokens
scheduling workloads across infrastructure
handling I/O-heavy application logic

Anyone building agent systems at scale knows the shape of this problem. The model call is only part of the runtime. The rest looks a lot like distributed systems engineering with a language model sitting in the loop.

Tokens change the CPU discussion

Huang described Vera as designed to process tokens as fast as possible, rather than following the classic cloud CPU model centered on cores and general-purpose application throughput.

That framing is useful, with a caveat. CPUs don’t “process tokens” the way GPUs run matrix operations for transformer inference. Token-heavy agent systems create plenty of CPU overhead around pre-processing, post-processing, routing, memory management, networking, and control flow. If Vera improves latency and throughput for those tasks, especially inside tightly coupled Nvidia systems, it can matter.

The traditional cloud CPU design target has been breadth: many cores, strong virtualization, predictable performance across varied workloads, and good efficiency for microservices, databases, and app servers. Agentic workloads can look different. They may involve lots of short-lived tasks, bursty tool calls, stateful coordination, and frequent context movement between CPU memory, accelerators, storage, and networking.

That creates a different optimization target:

lower latency between CPU and GPU
faster token streaming paths
better memory bandwidth for context-heavy workloads
tighter scheduling across inference pipelines
lower overhead for tool execution and orchestration
improved power efficiency under AI-specific server patterns

Nvidia has been building toward this for years. Grace, its Arm-based CPU, already gave Nvidia a CPU story for accelerated computing. Vera appears to sharpen that story around agents and the Rubin generation.

The technical question is whether Vera is materially better for agent workloads, or whether Nvidia is packaging predictable server-side needs around the hottest AI architecture of the moment. The answer is probably some of both.

The “billions of agents” bet

Huang’s TAM depends on a specific assumption: the world will have billions of AI agents, and those agents will need compute the way humans need PCs.

He put it plainly on the call. Today there are roughly a billion human users. In his view, there will eventually be billions of agents, each using tools, running tasks, and consuming CPU capacity.

The analogy is useful, but it hides a lot. A human with a PC is a high-latency, low-frequency operator. Agents can be spun up by the thousands, but most won’t run continuously. Many will be lightweight wrappers around workflows. Some will be scheduled jobs with model calls. Some will be persistent digital workers with memory, permissions, audit trails, and access to enterprise systems.

Infrastructure demand depends on implementation.

A basic support-ticket triage agent might need occasional inference plus database access. A coding agent that runs tests, edits files, opens pull requests, and debugs failures needs far more CPU, memory, sandboxing, and I/O. A multi-agent research system that fans out browser tasks, vector searches, code execution, and summarization pipelines can get expensive quickly.

The CPU market grows if agents become persistent, tool-heavy, and widely deployed inside companies. It grows less dramatically if enterprises consolidate tasks into fewer orchestrated services, or if inference platforms absorb much of the runtime complexity.

Nvidia is betting on sprawl. Enterprise software has a long history of rewarding that bet.

Why developers should care

For developers and AI engineers, Vera matters less as a chip announcement than as a signal about where AI infrastructure is heading.

The first wave of production AI work focused on model access: prompts, embeddings, retrieval, evals, fine-tuning, and inference endpoints. The next wave is runtime architecture. Agents create messy infrastructure requirements because they combine probabilistic model behavior with deterministic software systems.

That changes what teams need to measure.

Latency is now end-to-end task latency across planning, inference, tool calls, validation, retries, and result formatting. Cost includes CPU cycles, network calls, sandbox execution, logging, trace storage, vector database queries, and failed attempts. Security includes permission boundaries, tool access, secrets handling, auditability, and prompt-injection defenses.

If Nvidia can package CPU, GPU, networking, and software into an optimized agent platform, many teams will use it because integration friction is expensive. Nobody enjoys debugging performance across mismatched accelerators, drivers, inference servers, orchestration layers, and observability tools.

But tighter integration has a price. Nvidia’s biggest advantage is also its lock-in vector.

CUDA already shapes how AI teams build and deploy high-performance workloads. A Vera-Rubin stack could extend that gravity into agent runtime infrastructure. That may be fine for companies standardizing on Nvidia hardware, but it’s a risk for teams that need portability across AWS, Google Cloud, Azure, on-prem clusters, or cheaper inference providers.

Technical leaders should watch three practical details:

Software support Hardware claims matter less than framework support. Agent frameworks, inference servers, schedulers, and observability tools need to take advantage of Vera-specific features without forcing teams into brittle custom paths.
Price-performance under real workloads Benchmarks need to reflect multi-step agent behavior, not clean synthetic inference loops. Tool calls, retries, memory access, and I/O contention can erase theoretical gains.
Operational fit Agent workloads need isolation, policy enforcement, and traceability. If the platform optimizes throughput but makes governance harder, enterprises will slow adoption.

The hyperscaler tension

The timing of Huang’s pitch is deliberate. Cloud providers are openly trying to reduce Nvidia dependency.

AWS recently highlighted a large Meta deal involving millions of Amazon’s homegrown AI CPUs, and Amazon CEO Andy Jassy has argued that AWS can compete strongly in AI chips. That’s exactly the kind of narrative Nvidia wants to weaken.

Vera gives Nvidia an answer: if AI systems need a new class of CPU for agents, Nvidia wants to define that class before hyperscalers do.

The competition is messy. Hyperscalers don’t need to beat Nvidia in every benchmark. They need enough performance at better economics for their own fleets. A custom CPU or accelerator that works well inside AWS infrastructure can be highly valuable even if it doesn’t win the broader market.

Nvidia sells to the whole market. Its edge is the platform: chips, interconnects, libraries, systems, developer tooling, and a deep bench of customers already building around its stack. Huang’s “every major hyperscaler and system maker is partnering with us” line is meant to show that cloud companies can compete with Nvidia while still buying heavily from it.

Both things are happening. Cloud providers want independence from Nvidia, and they keep ordering Nvidia hardware.

The caveat behind the $200 billion number

A $200 billion TAM should be treated as a directional claim, not a forecast. TAM math in AI hardware often assumes aggressive adoption, high utilization, and limited price compression. Markets rarely behave that cleanly.

Several things could shrink or delay Nvidia’s Vera opportunity:

Agent adoption may be slower in regulated industries because tool-using systems are harder to secure and audit.
Model efficiency improvements could reduce infrastructure demand per task.
Hyperscalers may route more workloads to internal silicon.
Open standards and alternative runtimes could weaken Nvidia’s platform pull.
Enterprises may find that many “agents” are better implemented as standard workflow automation with occasional model calls.

There’s also a utilization problem. Agents can be bursty. Buying large amounts of specialized CPU capacity only makes sense if workloads are steady enough, or if the hardware also performs well on conventional server tasks. Nvidia will need to show Vera isn’t too narrow.

Still, the $20 billion standalone sales claim is hard to ignore. Even if some demand reflects early platform positioning or bundled customer commitments, it suggests buyers are preparing for CPU pressure around AI systems.

Nvidia is moving up the stack again

The Vera story fits Nvidia’s broader pattern. The company rarely sells a chip as just a chip now. It sells a computing architecture, then wraps it with software, networking, reference systems, and developer gravity.

That strategy worked for AI training. It worked for inference. Nvidia now wants it to work for agent execution.

For engineering teams, the practical takeaway is simple: start profiling the non-model parts of agent systems with the same seriousness applied to inference. Tool calls, memory movement, scheduling, sandboxing, retries, and audit logs are becoming first-order infrastructure costs.

If agents become a major production workload, the CPU side of AI will stop looking like background plumbing. Nvidia is trying to make sure it owns that layer too.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Data engineering and cloud

Build the data and cloud foundations that AI workloads need to run reliably.

Related proof

Cloud data pipeline modernization

How pipeline modernization cut reporting delays by 63%.

Microsoft says its first production Nvidia AI factory is now running in Azure

Microsoft just made a pointed infrastructure announcement. Satya Nadella says the company has deployed its first production Nvidia “AI factory” inside Azure, with more coming across Microsoft’s global data center footprint. The numbers are big enough...

Nvidia Q1 revenue hits $46.7B as data center sales reach $41.1B

Nvidia reported $46.7 billion in revenue for the quarter, up 56% year over year. $41.1 billion came from data center. Net income reached $26.4 billion. The number that stands out for infrastructure teams is $27 billion of data center revenue from Bla...

Andy Jassy's shareholder letter makes Amazon's $200 billion infrastructure case

Andy Jassy’s annual shareholder letter is meant for investors. This year, it also reads like a broad challenge to the infrastructure market. Amazon says it plans to spend $200 billion in capex in 2026, and Jassy uses the letter to defend that number ...