What is hierarchical reinforcement learning?

An approach dividing decision-making into high-level mission planning and low-level control to keep multi-agent systems manageable and robust.

Why is decentralized consensus important in defense AI?

It ensures agents maintain shared intent and coordinate actions even when connectivity is intermittent or disrupted.

How does adversarial training enhance system resilience?

By embedding simulated red-team interference during development to harden AI models against real-world attacks.

Artificial Intelligence July 25, 2025

Defense AI at TechCrunch Disrupt 2025: Why resilient systems mattered more than larger models

At TechCrunch Disrupt 2025, the most useful discussion on defense AI stayed away from bigger models and flashy demos. It focused on systems that still work when communications are unreliable, hardware is limited, and the operator has to trust the out...

DARPA, the Navy, and the hard part of defense AI

That’s the hard part.

The panel brought together Dr. Kathleen Fisher, head of DARPA’s Information Innovation Office, former In-Q-Tel executive Sri Chandrasekar, and Department of the Navy CTO Justin Fanelli. Their message was clear enough: defense is moving past pilot projects. The work now is autonomous coordination, explainability that holds up under pressure, and software supply chains that don’t turn into attack paths.

That matters to developers and ML engineers because defense is forcing a level of rigor that commercial AI still often sidesteps.

Autonomy when the network breaks

The most interesting technical thread was DARPA’s focus on multi-agent reinforcement learning for drone swarms, autonomous vehicles, and maritime robots. The research area isn’t new. The operating conditions are what matter.

These systems are being built for environments where GPS may be degraded, links may be jammed, and central coordination may disappear at the worst possible time. In that setting, a lot of normal cloud assumptions collapse fast. There may be no clean control plane. No single coordinator with the full picture. No time to wait on a verbose model output while a vehicle is moving through contested airspace.

DARPA’s approach, at least from the panel, rests on three pieces:

Hierarchical reinforcement learning, with one policy handling mission-level decisions and another handling lower-level control
Decentralized consensus, so a swarm can maintain shared intent with intermittent connectivity
Adversarial training, with simulated red-team interference built into development

That stack tracks. Hierarchical RL is one of the few practical ways to keep multi-agent systems manageable when decisions happen on different time scales. A swarm doesn’t need every agent solving the whole mission all the time. It needs a sane split between local reflexes and global objectives.

The reference pseudo-code for the hierarchical update loop was simple, but the point was right. High-level actions span multiple low-level steps. Rewards roll up. Policies update at different layers. That’s a lot more robust than stuffing mission planning and motor control into one brittle model.

The hard part is validation. Training a swarm in simulation is manageable. Showing that it won’t go sideways under ugly edge cases is much harder. Defense can only tolerate so much autonomy before failure modes get too opaque.

So the panel kept coming back to trust.

Explainability that holds up in the field

Fisher put it plainly: if you can’t trust the AI, you can’t use it in high-stakes scenarios.

Defense uses a stricter definition of trust than most enterprise AI teams do. A dashboard confidence score doesn’t cut it. Operators need some account of why a system recommended an action, what would have changed that recommendation, and how uncertain the model is under current conditions.

The panel highlighted three approaches DARPA is pushing:

Counterfactual reasoning, which asks what input changes would have produced a different output
Prototype-based classification, where models point to representative examples instead of buried latent features
Bayesian uncertainty calibration, which fits messy sensor fusion a lot better than the false confidence many modern models project

This is one area where defense is ahead of a lot of commercial deployment. Consumer products can get away with shallow explainability because the downside of a wrong answer is often just annoyance. In military and critical infrastructure settings, a bad recommendation can escalate an incident, misidentify a target, or sink a mission.

The tooling also has to run at the edge, on ruggedized hardware, fast enough to support decisions in real time. That’s a much tougher engineering problem than dropping SHAP plots into a notebook.

It also runs straight into compliance. The panel referenced NIST SP 800-53, and that isn’t paperwork for its own sake. If a model feeds into a mission decision, you need traceability, control mapping, and something defensible in an audit. Teams that still treat explainability as a UX accessory are going to have a bad time in this market.

The Navy’s zero-trust push

Justin Fanelli’s comments on the Navy’s software stack were probably the most broadly useful part of the panel for teams outside defense.

He focused on a zero-trust supply chain across cloud environments, shipboard clusters, and mobile command systems. The ideas are familiar. The implementation details matter:

Ephemeral workloads that spin up, run a mission-specific model, and terminate
Cryptographically signed artifacts for both container images and ML models
Validation through an on-prem HSM
mTLS and service mesh controls for east-west traffic
Least-privilege defaults across distributed services

None of this is glamorous. All of it is solid engineering.

In defense, the consequences of lateral movement are harsher. But the same pattern applies in commercial AI. Model serving has widened the attack surface. Every feature store, inference endpoint, vector database, CI runner, and internal package registry is another place to get burned.

A lot of teams still treat model artifacts as semi-trusted blobs. That’s sloppy. If you sign your binaries but not your model weights, you’ve left an obvious hole.

The panel’s version of secure AI pipelines was refreshingly concrete: identity, provenance, artifact signing, network segmentation, short-lived compute. Basic controls, applied properly.

That’s what real systems look like.

Why this work will spread

Chandrasekar, coming from In-Q-Tel, pointed to the investment angle. There’s a growing market for dual-use infrastructure: MLOps tools that survive disconnected operations, autonomy frameworks that run on edge devices, and interoperability layers for ugly mixed environments.

That last part matters.

Defense rarely gets a clean slate. It has to integrate with legacy systems, classified networks, and standards like JADC2 data models. The work is less glamorous than greenfield AI pitches, but far more useful. Vendors that can plug into messy operational stacks have a better shot than startups selling elegant architectures that only work in ideal conditions.

There’s an obvious spillover path here. The same properties defense wants also matter in civilian sectors that can’t afford brittle AI:

energy grids
transportation systems
ports and logistics
industrial control environments
emergency response networks

Those systems deal with intermittent connectivity, hostile environments, old infrastructure, and serious consequences when automation fails. Reliability patterns built for defense tend to travel.

The trade-offs don’t go away

It’s easy to come away from a panel like this with a neat formula: build autonomous systems, add explainability, secure the pipeline, done.

The engineering reality is messier.

Autonomy vs controllability: More local decision-making helps when communications fail, but it also creates more chances for edge-case behavior that operators can’t quickly inspect.

Explainability vs latency: Counterfactuals and uncertainty estimates are useful, but they cost compute. On constrained edge hardware, every extra inference path has a power and thermal bill.

Security vs deployability: Signed artifacts, enclaves, and strict identity controls reduce risk, but they also complicate updates, add integration pain, and stretch already slow certification cycles.

Robustness vs iteration speed: Once you add adversarial testing, provenance tracking, and ATO timelines, model release velocity drops. In defense, that’s usually the right trade.

The source material also mentioned secure enclaves like AMD SEV and Intel SGX, along with adversarial techniques such as FGSM and PGD in CI pipelines. That points to where this work is heading: security built into the ML lifecycle instead of pasted on after training. Engineers in regulated environments should pay attention. This will spread.

What technical leaders should take from it

A few signals stand out.

Edge AI is getting serious. If your architecture assumes stable broadband and centralized inference, you’re designing for the easy case.

Model provenance is becoming table stakes. Signed containers aren’t enough. Signed models, verified dependency chains, and reproducible training metadata are moving into the expected baseline.

Explainability is also getting narrower and more useful. The question isn’t whether a model is interpretable in some abstract sense. It’s whether an operator can act on the output with a clear read on confidence, alternatives, and failure risk.

And defense buyers are going to reward interoperability over novelty. Teams selling into this market should spend less time on frontier-model magic and more time surviving ugly integration work, compliance review, and hostile operating conditions.

That may be less exciting than another multimodal model launch. It’s also where some of the most serious engineering is happening.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design controlled AI systems that reason over tools, environments, and operational constraints.

Related proof

Field service mobile platform

How field workflows improved throughput and dispatch coordination.

CES 2026 puts physical AI, robotics, and edge silicon at the center

CES 2026 made one point very clearly: AI demos have moved past chatbots and image generators. This year, the loudest signal was physical AI. Robots, autonomous machines, sensor-heavy appliances, warehouse systems, and a lot of silicon built to run pe...

FieldAI raises $405M for a cross-platform robotics foundation model stack

FieldAI has raised $405 million to build what it calls a universal robot brain, a foundation model stack meant to run across different machines and environments. The company says the stack is already deployed in construction, energy, and urban delive...

Periodic Labs raises $300M seed to build autonomous scientific labs

Periodic Labs has raised a $300 million seed round to build autonomous labs that can design experiments, run them with robotics, measure results, and use that data to plan the next round. For a seed round, that number is wild. The team helps explain ...