What datasets were used to train Aurora?

Aurora was trained on over one million hours of satellite imagery, radar data, station observations, and simulation outputs.

How fast can Aurora generate forecasts?

Aurora produces full-grid weather and air quality forecasts in seconds on a single GPU.

Is Aurora available for public use?

Yes, Microsoft has open-sourced the code and pretrained weights so developers can test and adapt the model locally.

Deep Learning May 26, 2025

Microsoft Aurora uses transformers for fast weather and air quality forecasting

Microsoft’s new Aurora model makes a serious claim: forecasts for weather and air quality that run in seconds rather than hours, while matching or beating major operational systems on several difficult tasks. According to Microsoft’s paper in Nature ...

Microsoft’s Aurora shows what happens when weather forecasting becomes an inference problem

Microsoft’s new Aurora model makes a serious claim: forecasts for weather and air quality that run in seconds rather than hours, while matching or beating major operational systems on several difficult tasks.

According to Microsoft’s paper in Nature and its release materials, Aurora was trained on more than one million hours of atmospheric and environmental data, including satellite imagery, radar, station observations, and simulation outputs. In Microsoft’s tests, it predicted Typhoon Doksuri’s landfall four days in advance, beat the U.S. National Hurricane Center on five-day tropical cyclone track forecasts across the 2022 to 2023 season, and turned in strong results on air-quality forecasting and the 2022 Iraq sandstorm.

That’s an unusually broad set of results. Weather models don’t get judged on one tidy benchmark. They get judged on whether they hold up across different phenomena, scales, and forecast windows. Aurora looks built for that kind of test.

Why Aurora matters

Speed is the obvious draw. Traditional numerical weather prediction still depends on large simulation pipelines running on supercomputers. Aurora produces full-grid forecasts in seconds. That changes who can run forecasts, how often they can refresh them, and where they fit in software.

For logistics, utilities, insurance, emergency response, commodity trading, or city operations, forecast latency matters almost as much as forecast skill. A modest speedup is nice. A model that can rerun repeatedly on a single GPU instead of waiting on a full NWP pipeline changes the shape of the product.

It also changes cost. High-quality forecasting has mostly belonged to national weather agencies and well-funded research groups because physics-based models are expensive to run and maintain. Aurora doesn’t replace that world, but it lowers the barrier.

That’s why the open-source release matters. Without code and pretrained weights, Aurora would be another impressive internal result. With them, developers can test whether it actually works under local data, ugly constraints, and production noise.

The technical bet

Aurora follows a pattern that’s starting to look credible in scientific ML: learn state transitions directly from very large, messy datasets, then add enough physical structure to stop the model from wandering into nonsense.

At a high level, it combines:

a convolutional front end to encode spatial inputs such as satellite and radar grids
transformer-style attention layers to model long-range dependencies across space and time
physics-informed losses that penalize physically implausible forecasts
multi-scale forecast heads that emit outputs at different resolutions

That architecture fits the problem.

Convolutions handle local spatial structure well. Cloud bands, fronts, storm cells, local gradients. Attention helps with the parts that don’t stay local. Jet stream changes, pressure systems, cyclone motion, pollutant transport. Those patterns span long distances and don’t fit neatly into small neighborhoods.

The physics-informed loss is an important detail. Without it, Aurora risks looking like another generic “transformers for everything” paper. Weather data is noisy, but the underlying system still obeys physical constraints. If a model gets low error while violating basic physical behavior, it’s learning shortcuts, not forecasting well.

Aurora still isn’t a physics simulator. It’s a learned model with guardrails meant to keep statistically plausible output from becoming meteorologically wrong.

Why the multi-scale output matters

One of Aurora’s stronger design choices is its multi-scale forecasting heads. Microsoft says the model can generate forecasts on global grids such as 0.25° × 0.25° and also much finer local grids, down to around 0.01° for mesoscale events.

That matters because weather use cases split hard by scale.

A shipping company cares about the broad synoptic picture: storm track, wave conditions, major pressure systems. A city air-quality team cares about much more local behavior. So does a grid operator trying to estimate solar dips from fast-moving cloud cover. A model that can work across those scales is much more useful than a narrow specialist.

It’s also technically difficult. High-resolution output is where models often get expensive, unstable, or noisy. If Aurora can do that without wrecking inference cost, that’s a real engineering win.

Microsoft says it uses mixed-precision inference, quantization, and CUDA-optimized fused kernels that combine convolution, attention, and upsampling to keep runtime down to seconds on a single GPU. That sounds like production-minded work, not a paper demo.

Where Aurora fits next to 4CastNet and WeatherNext

Aurora doesn’t come out of nowhere. It sits in the same line as Meta’s 4CastNet and DeepMind weather models, both of which have already shown that learned forecasting can compete with or outperform classical baselines on some tasks.

Microsoft seems to be pushing harder on breadth. Aurora is framed less as a model for one benchmark and more as a foundation model for Earth system prediction. Tropical cyclones, sandstorms, and PM2.5 forecasting are different operational problems, but they all depend on large spatiotemporal representations.

That framing makes sense. Teams don’t want five disconnected models if one trained backbone can be fine-tuned across related prediction tasks.

Still, “foundation model” is usually where the evidence starts getting stretched. Forecasting systems fail unevenly. Regional microclimates, sparse sensor coverage, local topography, data shifts, and extreme events all expose weaknesses that broad benchmarks can smooth over. Aurora looks strong. Anyone deploying it should still assume local validation is mandatory.

What developers and ML teams should care about

The open-source release is the practical part.

If you run an ML platform team or applied research group, Aurora gives you a starting point for domain adaptation instead of a blank notebook. That changes the build versus buy calculation.

A few use cases stand out.

Fine-tuning with local data

Regional forecasting is where pretraining helps but doesn’t finish the job. If you have coastal sensors, utility telemetry, traffic-linked pollution data, or dense municipal weather stations, Aurora’s pretrained weights could be a much better base than training from scratch.

That’s especially interesting for air quality. PM2.5 forecasts depend on meteorology, terrain, emissions, and transport. A model already trained to connect atmospheric state with pollutant behavior starts ahead.

Running inference inside software

Because Aurora runs quickly, forecasts can be treated as something an application queries often rather than a batch artifact refreshed a couple of times per day.

That opens up systems like:

route-planning tools that continuously re-score paths
energy platforms that reforecast solar and wind output every few minutes
emergency tools that update hazard maps as observations come in
urban operations dashboards that refresh AQI risk in near real time

That’s a different product architecture from consuming a slow external feed.

Hybrid stacks

The best near-term use for models like Aurora may be alongside classical NWP rather than as a replacement. Use numerical models where physics-heavy simulation still wins. Use learned models for fast updates, bias correction, downscaling, and scenario generation.

That hybrid setup is easier to sell inside serious organizations because it fits existing validation and governance processes.

The trade-offs are real

Aurora’s speed is compelling. Its limits are fairly obvious too.

First, data quality still rules. A large model trained on heterogeneous environmental data can absorb a lot of signal, but weak regional coverage and stale sensors still hurt.

Second, extreme events are where confidence becomes dangerous. Benchmark wins on storms matter, but rare events are exactly where uncertainty calibration matters most. A point forecast alone isn’t enough.

Third, open weights don’t remove the infrastructure work. For production use, you still need ingestion pipelines for satellite feeds, radar, station logs, and validation data. You need drift monitoring, retraining cadence, and guardrails for outputs that look plausible but are wrong.

There’s also a governance problem. Environmental forecasts feed decisions with legal and public-safety consequences. If an AI forecast is going in front of dispatch teams, utilities, or public agencies, you need auditability and fallback paths. “The model said so” won’t survive contact with the real world.

The shift underneath this

Aurora points to a change that’s been building for a while: forecasting is moving from pure simulation toward learned inference over large environmental datasets.

That doesn’t kill classical weather modeling. It does change the surrounding stack. Forecasting starts to look more like modern ML infrastructure, with pretrained backbones, domain fine-tuning, GPU-optimized inference, continuous evaluation, and task-specific deployment.

Microsoft has released a model that looks technically serious, operationally relevant, and open enough to matter outside its own cloud. If Aurora’s results hold up in the field, weather prediction gets cheaper, faster, and easier to wire into products. That’s a meaningful shift.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Data science and analytics

Turn data into forecasting, experimentation, dashboards, and decision support.

Related proof

Growth analytics platform

How a growth analytics platform reduced decision lag across teams.

Satya Nadella warns companies about the hidden cost of proprietary AI models

--- Microsoft CEO Satya Nadella has joined a warning that used to live mostly in VC backchannels and startup paranoia: every time a company sends its best work into a proprietary AI model, it may be paying twice. Once in dollars. Again in data. That’...

Satya Nadella’s warning on enterprise AI: proprietary models can cost twice

--- Satya Nadella has a familiar complaint about enterprise AI, and a lot of CIOs and platform teams have been muttering the same thing for months: when you use a proprietary model, you may be paying twice. Once in tokens. Again in data. That’s the p...

AMI Labs raises $1.03B as Yann LeCun backs world models over revenue

Yann LeCun’s new company, AMI Labs, has raised $1.03 billion at a $3.5 billion pre-money valuation to build world models. That's a huge round for a company openly saying it won't chase near-term revenue, and it says a lot about where serious AI money...