Amazon puts Peter DeSantis over Nova, chips, and quantum in new AI org
Amazon has put longtime AWS executive Peter DeSantis in charge of a new AI organization spanning the company’s Nova models, custom silicon work, and quantum efforts. It’s a management change with a pretty clear point behind it. AWS has spent the past...
Amazon hands Peter DeSantis the keys to its AI stack
Amazon has put longtime AWS executive Peter DeSantis in charge of a new AI organization spanning the company’s Nova models, custom silicon work, and quantum efforts. It’s a management change with a pretty clear point behind it.
AWS has spent the past year trying to close a perception gap. It has the cloud footprint, the enterprise sales machine, and a broad AI catalog through Bedrock, but its first-party model story has felt looser than Microsoft’s Azure plus OpenAI setup or Google’s TPU plus Gemini stack. Putting DeSantis over models, chips, and quantum signals that Amazon wants tighter control over the whole system instead of a pile of related AI products.
Andy Jassy said as much in a staff note after Nova 2 launched at re:Invent. Amazon thinks it has room to compete by optimizing across models, silicon, cloud software, and infrastructure. This reorg is built around that idea.
DeSantis also makes sense for the job. He’s a 27-year Amazon veteran who has run major parts of AWS infrastructure. If Amazon wants someone who can turn internal platform bets into revenue, he’s an obvious choice.
Why it matters
Plenty of AI companies can rent GPUs and ship a model endpoint. That part is commoditizing. The harder work is getting cost, latency, scaling behavior, and security controls into a shape large customers will standardize on.
Amazon thinks that’s where it has an opening.
It already has Trainium and Inferentia for training and inference, Graviton for general compute, Nitro for isolation, EFA for high-speed networking, Bedrock for hosted models, and SageMaker for custom training pipelines. It also has Braket for quantum services, though that still sits well outside most production AI work. Until now, those pieces lived in the same company without one AI leader sitting over them.
The pitch is straightforward. If Amazon can line up model design with the hardware it controls and the runtime it already sells, it can push inference costs down and make performance more predictable than a stack built mostly on third-party GPUs and partner models.
That matters. For enterprise AI, cost per request and p95 latency usually count for more than benchmark bragging rights.
The technical case for co-design
The interesting part is co-design, stripped of the executive-speak.
On the model side, Nova 2 is Amazon’s latest push into multimodal foundation models for enterprise use. That usually means some mix of text, image, and maybe audio, plus retrieval hooks, tool use, and policy controls that fit enterprise workflows. The likely architecture path is familiar: transformer-heavy backbones, mixture-of-experts for scaling, prompt caching, and speculative decoding to cut latency.
None of that is new by itself. The point is building those features around the constraints and strengths of Amazon’s own chips and runtimes.
On silicon, AWS already has the foundation. Trainium is for training, Inferentia for inference, and the Neuron SDK compiles PyTorch and TensorFlow graphs down to those accelerators. If DeSantis’ org really owns both the model roadmap and the chip roadmap, Amazon can tune hardware around the ugliest parts of modern LLM serving instead of following a general-purpose GPU template.
That probably means work in a few familiar places:
- better
KV-cachehandling, because long-context inference burns memory fast - larger or smarter on-chip memory layouts to reduce movement overhead
- improved interconnects for distributed inference and training
- kernel-level tuning for attention, feed-forward layers, and MoE routing
- tighter graph capture so Python-side orchestration stops dragging performance down
For developers, this is where a lot of the waste still sits. Plenty of so-called AI platform performance comes down to moving less data, keeping caches warm, and avoiding scheduler weirdness when traffic spikes.
If Amazon can run Nova 2 materially cheaper on Inferentia than a comparable external model on GPU-backed instances, procurement math changes quickly.
Bedrock gets even more strategic
One of Amazon’s smarter calls over the last two years was avoiding an all-in bet on one model family. Bedrock hosts Amazon’s own models, but also partners like Anthropic and others. That gave AWS credibility with customers who wanted options.
This reorg reinforces that.
Amazon can push a clearer two-track pitch: use Nova if you want the tightest AWS integration and the best economics, or choose another model in Bedrock if behavior, benchmarks, or governance fit better. Either way, the customer stays inside AWS’s control plane.
That matters because the money isn’t only in model inference. It’s in the surrounding stack: vector search, orchestration, observability, IAM policies, VPC routing, private connectivity, data residency, and all the ugly internal systems that end up wired into an agent workflow. Bedrock gets stickier if the first-party model is also the one best tuned for the underlying hardware.
There’s a message here for model vendors too. If your model compiles cleanly to Neuron, your economics inside AWS probably look better. If you rely on CUDA-only tricks or fussy kernels that map poorly to Amazon silicon, serving costs are likely to climb. That won’t stop adoption on its own, but it does squeeze margins.
The spending says plenty
The timing fits Amazon’s broader AI spending spree. AWS announced a $50 billion commitment to U.S. government AI infrastructure in November. Amazon has already invested $8 billion in Anthropic. It has also reportedly considered a $10 billion investment in OpenAI, which tells you how wide a net it’s willing to cast.
That’s a lot of capital pointed in different directions. A centralized AI org is one way to keep it from turning into a portfolio with no operating model behind it.
DeSantis isn’t there to supervise interesting research projects. He’s there to turn those bets into a platform enterprises can buy without wondering whether AWS has three overlapping AI stories and no center.
What engineers should watch
The near-term question is whether this changes the developer experience or just the reporting lines.
A few areas matter over the next couple of quarters.
Neuron support and model portability
If Amazon wants customers to lean harder on its silicon, the Neuron toolchain has to get less painful. Engineers will put up with some friction for lower costs, but not endless graph-capture quirks or missing operator support. Better compatibility with FSDP, ZeRO, tensor parallelism, and common training loops matters more than another polished keynote demo.
Bedrock and SageMaker convergence
AWS has often felt split between managed model APIs and the serious ML platform. In practice, many teams need both. They prototype in Bedrock, then want custom fine-tuning, distillation, eval pipelines, or policy-heavy deployment paths in SageMaker. A unified AI org should smooth out some of that seam.
If Amazon executes well, teams will move between hosted endpoints, custom training, and agent orchestration without stitching together half a dozen services by hand.
Observability at the token and routing layer
As models get larger and more MoE-heavy, debugging gets stranger. Latency spikes can come from routing instability, cache pressure, poor batching, or retrieval stalls upstream. AWS has the infrastructure depth to expose useful telemetry here. If it does, that would be genuinely useful.
Most AI observability is still too high-level. Engineers need to see where requests actually go sideways.
Security for regulated deployments
AWS has an opening with government and regulated industries because it already speaks their language: KMS, PrivateLink, VPC isolation, audit logs, policy controls, enclave-based isolation. If Amazon ties those cleanly into Bedrock and Nova, it has a cleaner enterprise story than many model providers that still look like API startups sitting on top of a GPU fleet.
That won’t improve the models. It will make them easier to buy.
The limits are obvious too
Amazon’s strategy is coherent. It’s also late.
Microsoft still has the advantage in market perception because Azure became the default place to consume frontier AI through OpenAI. Google has spent years building its own silicon and has a tighter internal story around TPUs and Gemini than many people gave it credit for. AWS can catch up in enterprise adoption, but it isn’t leading the model conversation today.
There’s execution risk too. Vertical integration works when the pieces fit cleanly. When they don’t, customers end up stuck between services that are supposedly optimized together but awkward to customize. AWS has a long track record of shipping powerful infrastructure that also asks customers to understand too many moving parts.
And quantum remains the least urgent piece of this. Braket has legitimate uses in research and hybrid workflows, but for most AI engineers it’s background noise unless Amazon can make orchestration and error mitigation meaningfully practical. That’s nowhere near the main competitive fight.
The broader direction still makes sense. Amazon wants the model, the chip, the runtime, and the control plane working in concert.
That’s a better use of its strengths than chasing chatbot headlines. Now it has to ship.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Build the data and cloud foundations that AI workloads need to run reliably.
How pipeline modernization cut reporting delays by 63%.
Andy Jassy’s annual shareholder letter is meant for investors. This year, it also reads like a broad challenge to the infrastructure market. Amazon says it plans to spend $200 billion in capex in 2026, and Jassy uses the letter to defend that number ...
Amazon used re:Invent to put real numbers behind Trainium. According to Andy Jassy, Trainium2 is already a multibillion-dollar run-rate business, with more than 1 million chips in production. AWS also says more than 100,000 companies are using Traini...
ScaleOps has raised a $130 million Series C at an $800 million valuation, with Insight Partners leading and Lightspeed, NFX, Glilot Capital Partners, and Picture Capital also participating. The headline is funding. The actual point is simpler: compan...