Artificial Intelligence March 17, 2026

Niv AI says millisecond power control can recover lost GPU capacity

Niv AI has emerged from stealth with a pointed claim: AI data centers leave real GPU capacity on the table because their power systems can’t absorb short, synchronized spikes cleanly. The startup says it measures rack-level power at millisecond resol...

Niv AI says millisecond power control can recover lost GPU capacity

Niv AI thinks the bottleneck in GPU clusters is power timing, not GPU count

Niv AI has emerged from stealth with a pointed claim: AI data centers leave real GPU capacity on the table because their power systems can’t absorb short, synchronized spikes cleanly.

The startup says it measures rack-level power at millisecond resolution, then uses that data to smooth demand before a cluster hits throttling, battery overbuild, or both. It’s launching with a $12 million seed round led by Glilot Capital and Grove Ventures, with Arc VC, Encoded VC, Leap Forward, and Aurora Capital Partners also participating.

That may sound like facilities software. For teams running large training jobs, it reaches straight into scheduler design, NCCL behavior, GPU power caps, and the ugly math of operating a dense AI cluster when site power is already tight.

Why this matters

The obvious shortage is GPUs. The quieter one is power capacity.

A lot of planned AI capacity is stuck behind utility and interconnect schedules that stretch for years. Operators still have to keep H100 and B200 class systems fed, cooled, and inside site power limits. When a cluster produces sharp bursts, sites usually fall back on two options:

  • oversize UPS and battery systems to absorb the spikes
  • throttle cluster performance to stay within electrical limits

Both are expensive. One burns capital. The other burns GPU time, which may be worse.

That’s the gap Niv AI is targeting. Founders Tomer Timor and Edward Kizis are building sensing hardware and a control layer that aims to predict near-term demand and shape it in real time. The company says pilots in several US data centers should begin within six to eight months.

The timing is good. Operators have already squeezed plenty out of software tuning: kernels, compilers, parallelism strategies, interconnect topologies, cooling. Power behavior is one of the few knobs left that cuts across the whole stack.

The problem is real

GPU clusters don’t draw power smoothly. They jump.

A training step can swing between compute-heavy tensor core work and communication-heavy phases such as all_reduce or all_to_all over NCCL. Those phases draw power differently. In a large synchronized job, thousands of GPUs can hit the same boundary at nearly the same moment. That creates a brief site-level surge.

Second-level telemetry mostly hides this. The average can look fine. The electrical system still sees the spike.

That matters because utilities, UPS gear, and power distribution care about the transient, not the one-minute average. A cluster can stay within a comfortable-looking mean while still jumping by megawatts in a narrow window. That’s why “just look at nvidia-smi” has never been enough.

Built-in GPU telemetry helps, but it doesn’t solve this problem. Tools like NVML and DCGM expose power and utilization data, yet their sampling and smoothing can blur sub-10 ms events. And GPU-reported draw isn’t the same thing as what the rack or branch circuit is actually pulling. Once you care about electrical transients, rack-level sensing matters more than device self-reporting.

That part of Niv AI’s pitch holds up. If you want to treat power as a control problem, you need measurements from the power path itself.

What Niv AI is probably building

The company hasn’t published a detailed architecture, but the broad shape is easy to guess.

Start with rack-side or PDU-side sensors sampling power at high frequency, probably with familiar approaches like shunt measurement, Hall-effect sensors, or PMBus-connected metering. You’d also want synchronized clocks across feeds so events line up across racks.

Then comes a local ingest layer, whether they call it an edge gateway, row controller, or rack controller. Its job is to collect samples, align timestamps, extract useful features, and feed a streaming telemetry pipeline.

After that comes the hard part: deciding when to act and what to change.

The right mental model here is short-horizon predictive control. You want a forecast of the next 100 to 200 milliseconds and you want to intervene before the spike arrives. A feedback loop that reacts after the event is too slow. It also risks oscillating if it keeps chasing noise.

That points to some mix of:

  • fast heuristics for immediate response
  • a forecasting model trained on recent telemetry and job state
  • a control optimizer that applies small scheduling or power-cap changes across many nodes

This is one of those cases where old-fashioned control theory probably matters more than a fancy neural net. The forecasting model can be lightweight. The hard part is stability and response quality.

How a control plane could act without trashing throughput

There are already ways to shape power in a GPU cluster. None are magic. Together, they’re enough to matter.

Dynamic power caps

Nvidia’s management stack already lets operators set device power limits and application clocks. A control layer could shave the top off a burst for a few hundred milliseconds with commands like:

nvidia-smi -q -d POWER
sudo nvidia-smi -pl 425
nvidia-smi -ac <mem,sm>

If you do this carefully, short-duration caps can trim peaks with little effect on end-to-end step time. If you do it badly, you’ve just made very expensive GPUs slower.

Communication shaping

This may be the more interesting lever.

If a cluster-wide all_reduce launches in lockstep across racks, the power signature can line up enough to produce a nasty spike. A scheduler or communication runtime could introduce small offsets, switch collective algorithms, or spread job cohorts so they stop peaking together. A few milliseconds of jitter may be enough.

It sounds easy. At scale it isn’t. Small timing changes in collectives can show up in throughput, tail latency, and fairness across jobs.

Scheduler-aware placement

In Slurm, Kubernetes with Volcano, or a custom scheduler, jobs could be placed based not just on memory and accelerator availability but also on expected power profile. If one workload is bursty and another is relatively flat, putting them on the same PDU is asking for trouble.

This is where the product could become genuinely useful for platform teams. Better telemetry matters only if it feeds placement and runtime policy.

MIG and partitioned GPUs

With MIG, one physical GPU turns into several smaller compute actors. Good for utilization. Also good at creating new ways to synchronize demand by accident. A scheduler should treat those partitions as power participants, not just capacity units.

The promise is appealing. The trade-offs are real.

The easiest mistake in this category is treating a smoother power curve as proof of a better cluster. Sometimes it just means the cluster got slower in a way that’s less obvious.

The metric that matters is straightforward: did peak reduction come with only a tiny throughput hit? If Niv AI can keep performance impact under roughly 1 to 2 percent while materially reducing p99 spikes, operators will care. If the penalty drifts toward 5 percent, the economics look a lot worse.

There are other problems to solve too.

Control stability

Poorly tuned control loops oscillate. That risk is higher in AI clusters than some teams may expect because workloads are already bursty, tightly synchronized, and sensitive to timing. A naive PID controller can overshoot, react to stale signals, and make the system noisier.

Fairness in shared environments

If a cluster has multiple tenants, who gets power-capped during a peak? If one customer’s training run keeps getting slowed to protect another’s SLO, the platform team has a political problem on top of the technical one.

Sensor quality and calibration

High-rate rack sensing sounds great until you run into drift, noise, or bad calibration. False positives in the control loop are expensive. Data center electrical environments are messy enough that instrumentation quality matters a lot.

Security and blast radius

Any system that can change power caps, scheduler timing, or rack-level control policy has real operational authority. It needs tight access controls, audit trails, and sane failure modes. A buggy observability agent is annoying. A buggy power controller can wreck performance across an entire row.

What teams should watch

If you run training at any real scale, this matters less as startup news than as a sign of where infrastructure pressure is moving.

Power telemetry is joining utilization, memory bandwidth, and network contention as a scheduling signal. That has consequences:

  • ML platform teams should expect pressure to expose power-aware controls through schedulers and runtime policy.
  • Distributed training engineers may need to pay more attention to collective timing and phase alignment, not just raw communication speed.
  • Data center operators get a possible alternative to blunt-force throttling and costly electrical overbuild.
  • Tooling vendors in observability, orchestration, and DCIM have another integration point to think about.

For years, the AI stack treated power as a static constraint. That assumption is getting weaker. In modern GPU clusters, power behaves like a dynamic resource with its own bursts, latency, and coordination failures.

Niv AI still has to prove this works outside a slide deck and a few design partnerships. The idea is plausible. The execution risk is high. Getting telemetry quality, control loops, and scheduler integration right at the same time is a serious engineering job.

Still, the target makes sense. If your cluster is already GPU-rich and power-poor, the next efficiency gain probably won’t come from another kernel trick. It may come from turning 200 milliseconds of electrical behavior into a software problem.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service
Data engineering and cloud

Build the data and cloud foundations that AI workloads need to run reliably.

Related proof
Cloud data pipeline modernization

How pipeline modernization cut reporting delays by 63%.

Related article
Anthropic's $50 billion data center plan says more about Fluidstack than scale

Anthropic says it will spend $50 billion on U.S. data centers with Fluidstack, with the first facilities in Texas and New York due online in 2026. The number is huge, but the more telling part is the partner and the model behind the deal. Until now, ...

Related article
Microsoft says its first production Nvidia AI factory is now running in Azure

Microsoft just made a pointed infrastructure announcement. Satya Nadella says the company has deployed its first production Nvidia “AI factory” inside Azure, with more coming across Microsoft’s global data center footprint. The numbers are big enough...

Related article
NeoLogic is betting efficient server CPUs still matter in AI data centers

NeoLogic, a fabless startup from Israel, has raised a $10 million Series A to build server CPUs for AI data centers. That pitch stands out in 2026. Most of the industry is chasing accelerators, interconnects, and ways to cram more NPUs onto a board. ...