How much capacity did AWS add in the past year?

AWS added 3.8 gigawatts of data center capacity over the last year.

What is driving AWS’s fastest growth since 2022?

Surging demand for AI training and inference, backed by large multi-year deals and rapid region expansions.

Why are power constraints shaping cloud architecture?

Hyperscale AI clusters require significant megawatt allocations, making power availability a key design factor.

Technology November 1, 2025

AWS growth hits a 2022 high as cloud infrastructure demand holds up

AWS just posted its fastest growth since 2022. Revenue rose 20% year over year, hitting $33.1 billion through the first nine months of 2025, with $11.4 billion in Q3 operating income. That’s the number Wall Street cares about. For engineers, the more...

AWS’s latest quarter says the AI compute squeeze is still getting worse

For engineers, the more useful detail is underneath it. AWS added 3.8 gigawatts of data center capacity over the past year, opened a new region in New Zealand, and says it's still adding infrastructure as fast as it can because demand keeps coming. Andy Jassy put it plainly: AWS is investing hard in capacity, and it’s filling that capacity as soon as it comes online.

That matters because AI infrastructure demand still hasn't let up. The biggest buyers look even more aggressive. OpenAI and Oracle reportedly struck a long-term cloud deal starting in 2027, with OpenAI committed to paying Oracle $30 billion a year for data center services. Google and Anthropic have their own multi-year, multi-billion-dollar cloud arrangement. Those deals point to the same thing: frontier training and large-scale inference are now fixed line items for the biggest AI companies.

For everyone else, spare capacity is no longer a safe assumption.

3.8 GW is a serious number

Cloud earnings calls usually hide behind vague scale language. Gigawatts are harder to hand-wave away. Power is the constraint.

Adding 3.8 GW in a year amounts to a large wave of hyperscale buildout. A major data center facility might be rated around 50 MW to 100 MW on paper. Usable IT load is lower once you account for redundancy, cooling, and the rest of the overhead. Even so, 3.8 GW is enormous.

It also says a lot about where AWS thinks demand is headed. This goes beyond routine region expansion for ordinary cloud workloads. The buildout is for dense accelerator clusters, heavier networking, and tougher cooling demands.

Power now shapes cloud architecture almost as much as CPU and GPU roadmaps. If you're waiting for a market where H100s, next-gen Trainium, fast interconnect, and friendly spot pricing all line up at once, you'll be waiting a while.

The bottleneck isn't just GPUs

Teams still talk about "getting GPUs" as if that settles the problem. It doesn't. Distributed training performance depends on the system wrapped around the accelerators.

On AWS, that usually means:

EFA for low-latency, high-throughput networking
tight placement with cluster placement groups
NCCL tuned well enough that allreduce doesn't leave expensive silicon idle
storage fast enough to keep the training loop fed

This is where teams still burn money. They lock down a block of high-end instances, wire up torchrun, and assume utilization will sort itself out. Then the cluster spends too much wall-clock time waiting on I/O, dealing with topology issues, or eating long-tail latency between nodes.

A minimal multi-node PyTorch setup on EFA can look small enough:

export FI_PROVIDER=efa
export FI_EFA_USE_DEVICE_RDMA=1
export NCCL_SOCKET_IFNAME=eth0
export NCCL_ALGO=Ring

torchrun --nproc_per_node=8 --nnodes=$WORLD_SIZE --node_rank=$RANK \
--rdzv_backend=c10d --rdzv_endpoint=$MASTER_ADDR:29400 train.py

The hard part starts after that. You still need to check whether your job topology matches the network layout you actually got, whether oversubscription shows up under load, and whether checkpointing is dragging down throughput.

For large LLM training jobs, that isn't polish. It's cost.

Storage still wastes good clusters

GPU clusters get the attention. Storage is often where the actual trouble starts.

For training, AWS's common pattern is still S3 for durable object storage with FSx for Lustre on top as the high-throughput working layer. That setup makes sense. S3 is cheap and durable. FSx can feed active jobs at tens to hundreds of GB/s. But object storage is still bad at small files and metadata-heavy workloads, and plenty of data pipelines still look exactly like that.

If your training run spends 20% to 40% of its time stalled on input, the first fix probably isn't in the model stack. It's in the data path.

That means aggressive sharding, larger objects, caching hot datasets closer to compute, and using local NVMe when preprocessing or embedding generation would otherwise flood the network. A lot of "GPU underutilization" is really a storage architecture problem with a very expensive symptom.

It also changes how teams should test scale. Benchmarking one node and extrapolating to 128 accelerators is still common, and it still fails in familiar ways. The useful question is whether the full pipeline holds up under cluster-scale load, including data ingress, checkpointing, and retry behavior.

Inferentia deserves a real look

AWS's AI stack now has a clear split. Train where the software ecosystem is strongest and capacity is available. Serve where the economics look better.

For many teams, that still means training on NVIDIA-based EC2 fleets such as p5 instances with H100s, then evaluating Inferentia for inference. AWS has spent time on the Neuron SDK to make that handoff less painful for PyTorch users, especially for transformer-heavy workloads and common operator sets.

A minimal flow looks like this:

import torch
import torch_neuronx

neuron_model = torch_neuronx.trace(model, example_inputs)

That's the easy part. The tougher question is whether your model lands cleanly on Inferentia without awkward compromises in latency, operator support, or deployment complexity.

AWS still has work to do here. CUDA remains the default mental model for most ML teams because the tooling is familiar, the community debugging surface is massive, and the edge cases are well worn. Neuron has improved, but AWS silicon still asks teams to absorb some ecosystem friction in exchange for cheaper inference. If your traffic is large and steady, that can pencil out. If your deployment changes every week, maybe not.

Capacity planning is turning into a product question

AWS reportedly cut around 14,000 corporate roles this week to keep funding AI investment. That's ugly, but it tells you where Amazon sees the return. Large AI customers are signing contracts that reward whoever can guarantee power, cooling, network density, and availability.

That changes technical planning.

A year ago, plenty of teams could treat compute procurement as a vendor discussion plus some Terraform. Now it looks closer to capacity finance. If your roadmap depends on multi-node training windows, you may need to secure commitments well before you need them, whether through AWS account teams, reserved capacity, or products like EC2 Capacity Blocks for ML where they fit.

That will be uncomfortable for teams raised on elastic cloud rhetoric. Cloud is a lot less elastic when the underlying asset is a tightly networked rack of accelerators sitting behind power limits.

The practical response is straightforward:

reserve baseline capacity early
design training jobs to restart cleanly from frequent checkpoints
keep fallback plans such as LoRA, distillation, or smaller-context variants
separate latency-sensitive inference from throughput-heavy training so one doesn't interfere with the other
measure the whole pipeline, not just model-side FLOPS

A lot of roadmaps need a Plan B that assumes the biggest cluster you want won't be available on schedule.

The New Zealand region matters

AWS launching a region in New Zealand looks routine at first glance. It isn't.

New regions now solve for three things at once: data residency, access to new power and grid interconnects, and political pressure around sovereign workloads. For regulated customers, that changes architecture choices immediately. For AI teams, it also hints at where future accelerator capacity may show up. Geography is starting to follow electricity and policy as much as customer proximity.

That has direct consequences for distributed systems design. If your stack spans sovereign regions, model serving, vector storage, and training data flows get harder. Security policies get tighter. Replication costs rise. Latency budgets get worse. Compliance stops being paperwork and starts shaping topology.

What senior engineers should do now

The broad cloud story is simple: demand for AI infrastructure is still outrunning comfortable supply, and AWS is spending hard to keep up.

The engineering story is narrower and more useful. Expect the bottlenecks in the places vendor marketing barely mentions: power-backed availability, network topology, storage throughput, cluster fragmentation, and inference economics after the model ships.

If you're planning big training runs on AWS, test the input pipeline before adding nodes. If you're scaling inference, price out Inferentia seriously instead of dismissing it as AWS-specific weirdness. If your roadmap depends on accelerator access later this year, start those capacity conversations early.

And if your team still talks about cloud AI as mostly a software problem, that view is outdated. The stack starts with power now. Everything else sits on top of it.

What to watch

The harder part is not the headline capacity number. It is whether the economics, supply chain, power availability, and operational reliability hold up once teams try to use this at production scale. Buyers should treat the announcement as a signal of direction, not proof that cost, latency, or availability problems are solved.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Data engineering and cloud

Build the data and cloud foundations that AI workloads need to run reliably.

Related proof

Cloud data pipeline modernization

How pipeline modernization cut reporting delays by 63%.

Andy Jassy's shareholder letter makes Amazon's $200 billion infrastructure case

Andy Jassy’s annual shareholder letter is meant for investors. This year, it also reads like a broad challenge to the infrastructure market. Amazon says it plans to spend $200 billion in capex in 2026, and Jassy uses the letter to defend that number ...

Amazon puts Peter DeSantis over Nova, chips, and quantum in new AI org

Amazon has put longtime AWS executive Peter DeSantis in charge of a new AI organization spanning the company’s Nova models, custom silicon work, and quantum efforts. It’s a management change with a pretty clear point behind it. AWS has spent the past...

TCS secures $1B from TPG for India AI data center buildout

Tata Consultancy Services has secured $1 billion from TPG to fund half of HyperVault, a $2 billion project to build about 1.2 gigawatts of AI-focused data center capacity in India. The financing matters. So does the location. India has long been a hu...