What is meant by the ‘third device’?

It refers to a dedicated AI gadget separate from phones and laptops, designed for ambient, always-on interaction.

How does the hierarchical inference architecture work?

It uses low-power sensors to detect events, lightweight models for context classification, heavier on-device inference when needed, and cloud models for complex reasoning.

Which privacy issues must the device address?

Key concerns include data retention policies, user consent, bystander capture and deciding where inference occurs.

Artificial Intelligence June 24, 2025

Court filings outline OpenAI and Jony Ive's early AI device prototypes

OpenAI’s $6.5 billion deal for Jony Ive’s io already showed the company wants to get beyond apps and chatbots. Newly unsealed court filings make that push easier to picture. The documents describe early work on a dedicated AI device, or several proto...

Court filings give the clearest look yet at OpenAI and Jony Ive’s AI hardware plan

OpenAI’s $6.5 billion deal for Jony Ive’s io already showed the company wants to get beyond apps and chatbots. Newly unsealed court filings make that push easier to picture.

The documents describe early work on a dedicated AI device, or several prototypes, across desktop, portable, wired, wireless, mobile, and wearable forms. OpenAI and io reportedly describe it as a “third device” next to the phone and laptop. The filings also say the product is neither a traditional wearable nor an in-ear device, which rules out a few obvious guesses without telling us much about what it actually is.

That ambiguity matters because AI hardware has already produced a few expensive lessons. Humane’s AI Pin flopped. Rabbit’s R1 got attention, then struggled to justify itself as hardware rather than an app. If OpenAI and Ive want this to land, “AI with nice industrial design” won’t be enough.

The filings suggest they understand that.

What the prototype appears to do

The basic idea looks like ambient computing with stronger context awareness than a phone usually offers. Think continuous or semi-continuous sensing through microphones, motion sensors, proximity detection, and maybe cameras or depth sensors, plus enough local inference to decide what matters before everything gets sent to the cloud.

That’s the hard part. Context costs power, compute, and trust.

A device that listens, tracks motion, maps nearby space, and responds in real time has to solve four problems at once:

collect messy sensor data without draining the battery
fuse those streams into something a model can use
keep latency low enough to feel immediate
avoid turning into a privacy nightmare

The privacy piece sits right in the middle of the product, not off to the side. Ambient AI hardware only works if people trust it. A device that depends on persistent sensing will run into blunt questions about retention, consent, bystander capture, and where inference happens.

That’s the interesting part of this story. It’s a systems problem.

The hard part is sensor fusion

The filings point to a device that reads surroundings, not just prompts. That implies a sensor stack with at least audio and inertial data from an IMU, and possibly visual or depth input for spatial awareness.

For developers, this will sound familiar. The technical work looks a lot like robotics, mobile AR, and embedded ML, with LLMs layered on top rather than doing all the heavy lifting.

Raw sensor streams are noisy and asynchronous. Audio frames arrive on one clock, IMU samples on another, camera frames on another. Before any model can infer that a user is walking into a meeting room or that someone nearby is speaking to the device, those streams need alignment, filtering, and feature extraction.

A toy version could sync accelerometer timestamps with an audio buffer and feed combined features into an on-device model. Fine for a demo. Production systems are where it gets ugly. You need buffering strategy, clock drift handling, wake-word or event detection, adaptive sampling, and serious power management. Miss those details and the product either feels dim or dies before lunch.

That’s why the likely architecture is hierarchical:

ultra-low-power sensing runs most of the time
lightweight local models classify events and context
heavier inference kicks in only when confidence clears a threshold
cloud models handle expensive reasoning when needed

That pattern already exists in phones, earbuds, and smart home devices. The difference here is scope. OpenAI seems to be building a product where that stack is the product.

On-device inference has to carry a lot of the load

Any ambient assistant that ships everything to the cloud hits the same three walls fast: latency, privacy, and cost.

Latency is obvious. A device meant to respond in the middle of daily activity can’t feel like a remote API call. Privacy is just as obvious. Constant upstream capture of environmental audio or video won’t fly in plenty of settings. Cost may be the sleeper problem. If every glance, motion, spoken phrase, and context update triggers cloud inference, the economics get ugly fast.

So local inference is table stakes.

That puts model compression and edge runtimes above flashy demos. Quantization to int8 or int16, pruning, selective activation, and tight scheduling across CPU, NPU, or custom accelerators become first-order design choices. If OpenAI and io want a device that feels responsive, they’ll need a strict split between what runs locally and what gets pushed upstream.

The likely toolchain implications are pretty clear:

ONNX Runtime, TensorFlow Lite, or a custom inference stack for embedded Linux-class hardware
event-driven pipelines instead of always-on full-resolution processing
multimodal model architectures built for intermittent connectivity
secure local storage, encrypted caches, secure boot, and signed model updates

The source material mentions custom ASIC or FPGA blocks as one possible route. Plausible, sure, but custom silicon is expensive and slow unless volume justifies it. A more realistic early path is hybrid hardware built from off-the-shelf edge compute and aggressively optimized models. Later revisions could get more specialized once real usage patterns emerge.

That’s the sensible order of operations. A lot of hardware companies skip it and build for assumptions that don’t survive contact with users.

Why Ive matters

Reducing this story to “OpenAI buys famous Apple designer” misses the point.

Ive matters because ambient AI hardware has a nasty human-factors problem. The device has to fit into daily life without becoming one more thing to charge, wear, pair, update, explain, or apologize for. If it asks for too much attention, people drop it. If it disappears too completely, they won’t trust it. The industrial design problem is tied directly to the interface model.

That interface model is where things get hard fast.

A screen absorbs ambiguity. If a system only half understands your intent, it can throw options back at you. Ambient hardware gets fewer escape routes. Voice is awkward in public. Gestures are easy to misread. Spatial audio helps in some cases, not many. Haptics work for confirmation and not much else. So the product has to be selective about when it interrupts and very good at deciding when to stay quiet.

That’s a much harder job than wrapping a chatbot in nice hardware.

The field is already crowded

OpenAI isn’t entering an empty category. Meta keeps pushing smart glasses. Google is back in AR hardware discussions. Apple is reportedly exploring camera-equipped AirPods and keeps building more on-device intelligence into its chips. Everyone sees the same opening: AI gets more useful when it has persistent sensory input and a physical presence.

The problem is that most current products still feel like awkward companions to the phone, not convincing replacements for any part of it.

That may explain the internal “third device” framing. It suggests OpenAI isn’t trying to replace the smartphone outright. It’s trying to define a new layer for the moments when the phone is too manual, too distracting, or just not in the right place.

For technical leaders, that matters. If this category gets traction, the near-term platform question won’t be whether mobile apps disappear. It’ll be which parts of a product belong in low-latency ambient workflows and which still belong on screens.

What developers should watch

Forget the concept art. Watch the SDKs and runtime choices.

If OpenAI opens this to developers, the useful signals will look boring at first and tell you a lot underneath.

Sensor access

Do developers get raw streams, high-level events, or both? Raw access offers flexibility and opens up privacy and battery problems. Event APIs are safer and easier, but they narrow what developers can build.

Local inference boundaries

Can third-party apps run their own models on-device? If yes, on what runtime and with what quotas? If no, this looks much more like a managed assistant than a general compute platform.

Permissioning and compliance

Ambient hardware needs stronger permission models than mobile did. Audio and visual context often includes bystanders, workplaces, regulated environments, and private spaces.

Handoff to existing systems

Enterprises won’t care about a polished demo if the device can’t plug into their software. That means APIs, local networking, identity, device management, audit logs, and probably some gRPC or REST layer for vertical integrations.

The early use cases that make sense are easy to spot: meeting capture with local transcription, field service guidance, assistive tools for visually impaired users, industrial anomaly detection using vibration and sound, and context-aware coaching. Those are narrow enough to justify dedicated hardware and valuable enough to survive procurement review.

Consumer general-purpose assistants are a tougher sell, at least at first.

Bigger than one gadget

The filings don’t tell us exactly what OpenAI and io will ship. They do show that the company is thinking past chat surfaces and browser tabs. It wants AI to be present, context-aware, and physically embedded.

That’s a serious bet, and a risky one. Ambient devices ask a lot from hardware, software, and users’ patience. Most attempts so far have promised too much and shipped too little.

OpenAI does have one advantage the first wave lacked: frontier models paired with a hardware program built around inference, context, and industrial design from day one. If that works, developers won’t just be building for screens with an AI layer bolted on. They’ll be building for systems that listen, infer, and act at the edge.

That changes the stack. It raises the stakes too.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Data engineering and cloud

Build the data and cloud foundations that AI workloads need to run reliably.

Related proof

Cloud data pipeline modernization

How pipeline modernization cut reporting delays by 63%.

SkyeChip launches MARS1000, Malaysia's first homegrown edge AI processor

Malaysia now has a domestic edge AI processor. That’s the point of SkyeChip’s MARS1000 launch. It’s pitched as the country’s first homegrown edge AI chip, built for on-device inference, not cloud training. That matters because this is the part of AI ...

Google DeepMind's Gemini Robotics On-Device brings robot AI offline

Google DeepMind has rolled out Gemini Robotics On-Device, a version of its robotics model that runs locally on the machine instead of leaning on the cloud. For robotics teams, the pitch is straightforward. Google wants a general-purpose robot model t...

OpenAI's audio push points to a speech model in 2026 and a device after that

OpenAI is reportedly pulling its engineering, product, and research teams closer around audio, with a new speech model expected in early 2026 and an audio-first device on the roadmap about a year later. The bet is straightforward: fewer screens, more...