Who is Peter DeSantis?

A 27-year AWS veteran who led key infrastructure teams and now oversees models, chips, and quantum.

AWS’s family of multimodal foundation models for enterprise AI use cases.

How will AWS optimize AI performance?

By co-designing proprietary hardware (Trainium, Inferentia) and software runtimes to reduce latency and costs.

Artificial Intelligence December 18, 2025

Amazon puts Peter DeSantis over Nova, chips, and quantum in new AI org

Amazon has put longtime AWS executive Peter DeSantis in charge of a new AI organization spanning the company’s Nova models, custom silicon work, and quantum efforts. It’s a management change with a pretty clear point behind it. AWS has spent the past...

Amazon hands Peter DeSantis the keys to its AI stack

Amazon has put longtime AWS executive Peter DeSantis in charge of a new AI organization spanning the company’s Nova models, custom silicon work, and quantum efforts. It’s a management change with a pretty clear point behind it.

AWS has spent the past year trying to close a perception gap. It has the cloud footprint, the enterprise sales machine, and a broad AI catalog through Bedrock, but its first-party model story has felt looser than Microsoft’s Azure plus OpenAI setup or Google’s TPU plus Gemini stack. Putting DeSantis over models, chips, and quantum signals that Amazon wants tighter control over the whole system instead of a pile of related AI products.

Andy Jassy said as much in a staff note after Nova 2 launched at re:Invent. Amazon thinks it has room to compete by optimizing across models, silicon, cloud software, and infrastructure. This reorg is built around that idea.

DeSantis also makes sense for the job. He’s a 27-year Amazon veteran who has run major parts of AWS infrastructure. If Amazon wants someone who can turn internal platform bets into revenue, he’s an obvious choice.

Why it matters

Plenty of AI companies can rent GPUs and ship a model endpoint. That part is commoditizing. The harder work is getting cost, latency, scaling behavior, and security controls into a shape large customers will standardize on.

Amazon thinks that’s where it has an opening.

It already has Trainium and Inferentia for training and inference, Graviton for general compute, Nitro for isolation, EFA for high-speed networking, Bedrock for hosted models, and SageMaker for custom training pipelines. It also has Braket for quantum services, though that still sits well outside most production AI work. Until now, those pieces lived in the same company without one AI leader sitting over them.

The pitch is straightforward. If Amazon can line up model design with the hardware it controls and the runtime it already sells, it can push inference costs down and make performance more predictable than a stack built mostly on third-party GPUs and partner models.

That matters. For enterprise AI, cost per request and p95 latency usually count for more than benchmark bragging rights.

The technical case for co-design

The interesting part is co-design, stripped of the executive-speak.

On the model side, Nova 2 is Amazon’s latest push into multimodal foundation models for enterprise use. That usually means some mix of text, image, and maybe audio, plus retrieval hooks, tool use, and policy controls that fit enterprise workflows. The likely architecture path is familiar: transformer-heavy backbones, mixture-of-experts for scaling, prompt caching, and speculative decoding to cut latency.

None of that is new by itself. The point is building those features around the constraints and strengths of Amazon’s own chips and runtimes.

On silicon, AWS already has the foundation. Trainium is for training, Inferentia for inference, and the Neuron SDK compiles PyTorch and TensorFlow graphs down to those accelerators. If DeSantis’ org really owns both the model roadmap and the chip roadmap, Amazon can tune hardware around the ugliest parts of modern LLM serving instead of following a general-purpose GPU template.

That probably means work in a few familiar places:

better KV-cache handling, because long-context inference burns memory fast
larger or smarter on-chip memory layouts to reduce movement overhead
improved interconnects for distributed inference and training
kernel-level tuning for attention, feed-forward layers, and MoE routing
tighter graph capture so Python-side orchestration stops dragging performance down

For developers, this is where a lot of the waste still sits. Plenty of so-called AI platform performance comes down to moving less data, keeping caches warm, and avoiding scheduler weirdness when traffic spikes.

If Amazon can run Nova 2 materially cheaper on Inferentia than a comparable external model on GPU-backed instances, procurement math changes quickly.

Bedrock gets even more strategic

One of Amazon’s smarter calls over the last two years was avoiding an all-in bet on one model family. Bedrock hosts Amazon’s own models, but also partners like Anthropic and others. That gave AWS credibility with customers who wanted options.

This reorg reinforces that.

Amazon can push a clearer two-track pitch: use Nova if you want the tightest AWS integration and the best economics, or choose another model in Bedrock if behavior, benchmarks, or governance fit better. Either way, the customer stays inside AWS’s control plane.

That matters because the money isn’t only in model inference. It’s in the surrounding stack: vector search, orchestration, observability, IAM policies, VPC routing, private connectivity, data residency, and all the ugly internal systems that end up wired into an agent workflow. Bedrock gets stickier if the first-party model is also the one best tuned for the underlying hardware.

There’s a message here for model vendors too. If your model compiles cleanly to Neuron, your economics inside AWS probably look better. If you rely on CUDA-only tricks or fussy kernels that map poorly to Amazon silicon, serving costs are likely to climb. That won’t stop adoption on its own, but it does squeeze margins.

The spending says plenty

The timing fits Amazon’s broader AI spending spree. AWS announced a $50 billion commitment to U.S. government AI infrastructure in November. Amazon has already invested $8 billion in Anthropic. It has also reportedly considered a $10 billion investment in OpenAI, which tells you how wide a net it’s willing to cast.

That’s a lot of capital pointed in different directions. A centralized AI org is one way to keep it from turning into a portfolio with no operating model behind it.

DeSantis isn’t there to supervise interesting research projects. He’s there to turn those bets into a platform enterprises can buy without wondering whether AWS has three overlapping AI stories and no center.

What engineers should watch

The near-term question is whether this changes the developer experience or just the reporting lines.

A few areas matter over the next couple of quarters.

`Neuron` support and model portability

If Amazon wants customers to lean harder on its silicon, the Neuron toolchain has to get less painful. Engineers will put up with some friction for lower costs, but not endless graph-capture quirks or missing operator support. Better compatibility with FSDP, ZeRO, tensor parallelism, and common training loops matters more than another polished keynote demo.

Bedrock and SageMaker convergence

AWS has often felt split between managed model APIs and the serious ML platform. In practice, many teams need both. They prototype in Bedrock, then want custom fine-tuning, distillation, eval pipelines, or policy-heavy deployment paths in SageMaker. A unified AI org should smooth out some of that seam.

If Amazon executes well, teams will move between hosted endpoints, custom training, and agent orchestration without stitching together half a dozen services by hand.

Observability at the token and routing layer

As models get larger and more MoE-heavy, debugging gets stranger. Latency spikes can come from routing instability, cache pressure, poor batching, or retrieval stalls upstream. AWS has the infrastructure depth to expose useful telemetry here. If it does, that would be genuinely useful.

Most AI observability is still too high-level. Engineers need to see where requests actually go sideways.

Security for regulated deployments

AWS has an opening with government and regulated industries because it already speaks their language: KMS, PrivateLink, VPC isolation, audit logs, policy controls, enclave-based isolation. If Amazon ties those cleanly into Bedrock and Nova, it has a cleaner enterprise story than many model providers that still look like API startups sitting on top of a GPU fleet.

That won’t improve the models. It will make them easier to buy.

The limits are obvious too

Amazon’s strategy is coherent. It’s also late.

Microsoft still has the advantage in market perception because Azure became the default place to consume frontier AI through OpenAI. Google has spent years building its own silicon and has a tighter internal story around TPUs and Gemini than many people gave it credit for. AWS can catch up in enterprise adoption, but it isn’t leading the model conversation today.

There’s execution risk too. Vertical integration works when the pieces fit cleanly. When they don’t, customers end up stuck between services that are supposedly optimized together but awkward to customize. AWS has a long track record of shipping powerful infrastructure that also asks customers to understand too many moving parts.

And quantum remains the least urgent piece of this. Braket has legitimate uses in research and hybrid workflows, but for most AI engineers it’s background noise unless Amazon can make orchestration and error mitigation meaningfully practical. That’s nowhere near the main competitive fight.

The broader direction still makes sense. Amazon wants the model, the chip, the runtime, and the control plane working in concert.

That’s a better use of its strengths than chasing chatbot headlines. Now it has to ship.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Data engineering and cloud

Build the data and cloud foundations that AI workloads need to run reliably.

Related proof

Cloud data pipeline modernization

How pipeline modernization cut reporting delays by 63%.

Andy Jassy's shareholder letter makes Amazon's $200 billion infrastructure case

Andy Jassy’s annual shareholder letter is meant for investors. This year, it also reads like a broad challenge to the infrastructure market. Amazon says it plans to spend $200 billion in capex in 2026, and Jassy uses the letter to defend that number ...

Snowflake signs $6B AWS deal as Amazon pushes its AI chips

Snowflake has signed a new five-year, $6 billion agreement with Amazon Web Services. The size of the deal is the point. AWS says Snowflake has sold about $7 billion worth of services through AWS Marketplace since Snowflake was founded in 2012. This n...

AWS says Trainium2 is already a multibillion-dollar chip business

Amazon used re:Invent to put real numbers behind Trainium. According to Andy Jassy, Trainium2 is already a multibillion-dollar run-rate business, with more than 1 million chips in production. AWS also says more than 100,000 companies are using Traini...