What is simulation-first development?

An approach that uses high-fidelity simulation to test and validate autonomous systems across varied scenarios before real-world trials.

What is high-fidelity sim2real?

A pipeline that bridges simulated training and real-world performance by adapting models with realistic sensor and physics variations.

Why are rare edge cases important for physical AI?

Because infrequent events often reveal critical failures in latency, perception, or control that can compromise safety and reliability.

Artificial Intelligence September 13, 2025

Waabi and Apptronik on the shared hardware problem behind autonomous AI

TechCrunch Disrupt 2025 is putting Waabi and Apptronik on the same stage because they’re wrestling with the same class of problem. Waabi builds autonomous trucking systems. Apptronik builds humanoid robots. Different products, same job: making AI sur...

Waabi and Apptronik show where AI hardware gets real

That’s where the hard work is now. Real machines have latency budgets, battery limits, sensor drift, ugly edge cases, and people nearby.

For developers and technical leads, that changes the center of gravity. Raw model capability still matters, but the bottleneck is shifting toward systems engineering. Perception demos are easy to make look good. Shipping a truck or a humanoid that stays safe through 10,000 weird situations is much harder.

Why these two belong in the same conversation

Waabi has been one of the strongest voices for simulation-first autonomy. The case is pretty simple. If you want safe autonomous vehicles, especially trucks, road testing alone won’t get you there. You need high-fidelity simulation, synthetic scenario generation, closed-loop evaluation, and safety evidence that holds up better than “we drove a lot of miles.”

Apptronik works in a different part of robotics, but the engineering constraints are starting to rhyme. Humanoids need perception, planning, control, and safety systems that can handle dynamic spaces built for humans. They also have to make physical contact with the world, so mistakes show up fast. A bad plan becomes a dropped object, a balance failure, or an unsafe interaction.

That overlap matters. AVs and humanoids are starting to converge on the same stack ideas:

simulation and sim2real
world models and data engines
edge inference under real-time constraints
formal safety cases and traceable validation
OTA update discipline, security, and operational telemetry

The industry spent years treating these as separate categories. That separation is getting harder to defend.

Simulation is now table stakes

Waabi’s sim-first stance used to sound contrarian. It doesn’t anymore. The economics are too clear.

Long-tail failures sink physical AI. Rare merges, strange pedestrian behavior, bad weather, ambiguous lane markings, cargo oddities, temporary construction changes. In robotics, switch that to cluttered shelves, partial grasps, unstable loads, reflective surfaces, and humans doing unpredictable things nearby. Teams need to generate and replay these cases constantly.

That’s why generative scenario engines matter. Modern simulation stacks can create synthetic traffic agents, perturb sensor conditions, and vary physics fast enough to stress a policy far more efficiently than real-world collection alone. Domain randomization still has a place, but the fidelity bar keeps moving up. Camera simulation, LiDAR noise models, RADAR behavior, neural rendering, even Gaussian Splatting are pushing simulators closer to something engineers can actually trust.

Useful is the hard part.

A high-fidelity simulator that mirrors your own blind spots just gives you cleaner failure in production. Teams still need shadow mode, targeted real-world validation, and active error mining. That applies to trucks and humanoids alike. Sloppy sim2real work will catch up with you.

Still, the direction is obvious. Physical AI teams that don’t build simulation into the loop are choosing slower iteration, thinner safety evidence, and worse economics.

Humanoids still rise or fall on control

Apptronik is a useful reminder that humanoids still depend on mechatronics and control. Foundation models can help with perception and task planning. Physics still runs the schedule.

If a humanoid is going to work around people, actuation matters a lot. Torque-controlled systems and series-elastic actuators let the robot absorb contact, sense force, and avoid turning every collision into a safety event. But compliance has a cost. You usually give up some throughput and stiffness, and that affects cycle time, payload handling, and the size of the safety perimeter around the machine.

Control loops in this class often run at 500 to 1000 Hz. That sits very far from the rhythm of large multimodal model inference. Whole-body control, often with QP-based optimization or MPC, has to keep the robot balanced while manipulation policies decide what the hands, arms, and torso should do. If perception is late, the scheduler still has to keep the machine upright.

That’s why humanoid demos can tell you very little. A polished pick-and-place clip says nothing about duty cycle, thermal throttling, recovery behavior, or how the machine handles a slightly misaligned object six hours into a battery run. Those are product questions, not demo questions.

The companies worth taking seriously know that. Power-aware planning, thermal derating, embedded torque sensing, force limiting, and E-Stop hierarchies are product features.

Edge compute hits the wall first

Cloud AI trained the industry to think in bigger clusters and longer runs. Mobile autonomy works inside a power and thermal envelope, and it still has to hit deadlines every cycle.

That’s why platforms like NVIDIA Orin and Thor, Qualcomm’s robotics and automotive silicon, Ambarella’s CV3-AD, and Mobileye’s systems keep getting so much attention. The pitch is familiar: enough heterogeneous compute to run perception, planning, and control, plus hardware paths for lower-precision inference such as INT8 and FP8.

The trade-off is familiar too. Compression saves watts and cost, but it can erode edge-case accuracy. In a chatbot, maybe that’s acceptable. In a truck at highway speed or a humanoid handling tools next to a person, it needs guardrails. Teams need regression suites aimed at failure cases, dual-path fallbacks for safety-critical functions, and the discipline to prefer deterministic behavior over benchmark theater.

The software stack matters just as much as the silicon. Hard real-time control usually lives in RTOS partitions, while Linux with PREEMPT_RT handles less strict workloads. ROS 2 and DDS are common, but common doesn’t mean easy. QoS tuning, transport selection, timestamping, and loaned messages all matter if latency is a real requirement and not something left for the architecture slide.

Observability is becoming non-negotiable too. Physical AI teams are borrowing from modern systems tooling because they have to. PTP time sync, chrony, end-to-end timestamping, on-device tracing, and eBPF-style observability are increasingly practical requirements. If a perception node jitters, fusion drifts, and control misses a deadline, you need to know where the time went.

Sensors are cheaper. Safety is harder

One useful shift in this market is that sensors and accelerators are getting less exotic. Cameras are cheap. Depth sensing has improved. LiDAR still costs money, but it’s no longer a magical object. More companies can assemble a capable hardware stack.

That pushes differentiation somewhere else.

In AVs, multi-camera setups plus long-range RADAR and LiDAR, often fused into BEV representations, are becoming standard for teams that care about adverse conditions and planning stability. In humanoids, the stack looks different but follows the same pattern: short-range depth, joint torque sensing, tactile feedback, and sometimes visuotactile loops for manipulation.

Calibration is a bigger problem than many teams admit. Extrinsics drift. Time offsets creep in. Mechanical wear adds up. A fusion pipeline that looked solid in the lab gets sloppier over long deployments. Continuous online calibration isn’t glamorous, but it’s one of the reasons some systems stay reliable while others decay in the field.

Safety engineering is also getting more concrete. Standards like SOTIF, UL 4600, ISO 10218, ISO/TS 15066, UNECE R156, and ISO 21434 don’t make a system safe by themselves, but they do force structure around hazards, software updates, and cybersecurity. That matters because the safety case is turning into a core product artifact. Plenty of AI teams still behave as if documentation starts after the model works. In physical AI, that’s amateur behavior.

The teams that win here may not have the flashiest model. They’ll have tighter timing control, cleaner safety evidence, and better failure containment.

What to watch

If you’re building in robotics, autonomy, or adjacent edge AI, a few patterns stand out.

The data engine matters as much as the model. Fleet logging, scenario tagging, error mining, and selective capture are how teams keep storage and labeling costs from getting out of hand. Full-fidelity logging across a fleet gets expensive quickly. You need on-device triage and a clear sense of which failures are worth keeping.

Shadow mode should be standard practice. Run new planners or policies alongside production, compare action deltas, and keep them out of the control loop until they’ve earned trust. AV teams have known this for years. A lot of robotics teams still underuse it.

Security and update plumbing belong in the architecture from day one. Signed artifacts, secure boot, provenance tracking, and controlled OTA rollouts are basic hygiene for systems that can cause real harm when they fail.

Structured environments still look like the near-term commercial sweet spot. Yard logistics, warehouses, manufacturing cells, retail backrooms. Places with narrower variability and controllable safety zones. That may frustrate people still betting on fully general home robots, but it’s the sane place to start.

Waabi and Apptronik are building different products. They’re exposing the same industry fact. Physical AI is settling into an engineering discipline where simulation quality, control reliability, edge systems work, and safety cases matter more than another flashy model demo.

That’s healthier for the field. It’s also a lot less forgiving.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design controlled AI systems that reason over tools, environments, and operational constraints.

Related proof

Field service mobile platform

How field workflows improved throughput and dispatch coordination.

Genesis AI unveils GENE-26.5 and a full-stack robotics platform

Genesis AI has unveiled its first robotics model, GENE-26.5, and the bigger signal is the hardware around it. The company is building its own hand, sensing setup, and data pipeline alongside the model. Genesis emerged from stealth last year with a ...

Mbodi says AI agents can train industrial robots from natural-language instructions

Mbodi is heading to TechCrunch Disrupt 2025 with a clear claim: industrial robots can be trained from natural-language instructions, adapt on the job, and avoid a full rework every time a packaging line changes. That matters because the bottleneck in...

Waabi raises $1B to extend its autonomous driving stack from trucks to robotaxis

Waabi has raised $1 billion in a Series C and signed a deal with Uber to deploy robotaxis on the ride-hailing platform. The funding is big. The underlying bet is bigger. Waabi wants one autonomous driving stack to span long-haul trucks and passenger ...