Apple turns to Gemini and Google Cloud to rebuild Siri's AI stack
Apple has confirmed a multi-year partnership with Google to power AI features, including Siri, with Gemini and Google cloud technology. The news matters because it says something pretty blunt about Apple’s AI stack. After delays and a lot of privacy-...
Apple picks Gemini for Siri, and that tells you a lot about where AI assistants are headed
Apple has confirmed a multi-year partnership with Google to power AI features, including Siri, with Gemini and Google cloud technology. The news matters because it says something pretty blunt about Apple’s AI stack. After delays and a lot of privacy-first framing, Apple is bringing in outside model capability to get Siri where it needs to be.
A rebuilt Siri is expected in spring 2026. Apple wants something that can pass for a modern assistant, not another demo-friendly upgrade that falls apart on real requests. Gemini gives it a stronger model tier now, without forcing Apple to throw out the privacy case it has spent years making.
For developers and AI engineers, this looks like a practical assistant architecture. Small models on device. Apple-controlled cloud for sensitive or medium-weight work. Gemini for the expensive stuff. That’s a sensible stack, and probably Siri’s best shot at catching up this year.
Apple’s AI plan got more grounded
Apple introduced Apple Intelligence in 2024 with the usual privacy-heavy pitch and a set of useful but limited features: notification summaries, writing tools, image features, photo search. None of that made Siri feel like a serious assistant. The bigger Siri refresh slipped more than once, which usually means the hard parts were exactly what they looked like: engineering problems.
Assistant-grade AI is messy under the hood. It needs low latency, high reliability, decent reasoning, access to user context, and safety controls that hold up when the system starts touching messages, photos, reminders, or app actions. Apple had parts of that. It didn’t have enough model capability, fast enough, at the right scale.
So now it’s buying time and capability from Google.
That doesn’t mean Apple has abandoned its own models. It means Apple has accepted the architecture most teams end up with: run what you can locally, keep sensitive cloud flows on infrastructure you control, and hand the hardest prompts to a frontier model that’s already good at them.
The single-model fantasy didn’t last.
What Siri probably looks like now
Apple and Google haven’t published a system diagram, but the broad shape is easy to guess.
A voice request starts on device. Audio gets transcribed locally through ASR. Intent classification probably happens there too, along with quick checks on whether the request is a simple command, a tool call, or something that needs deeper reasoning. If it’s “set a timer for 10 minutes,” “open Notes,” or “find the photo from last Tuesday,” that can stay local, or close to it.
If the request needs more context or more compute, Apple can route it to a private cloud tier. This is where Apple’s privacy posture matters. The company has already described a tightly locked-down approach for cloud inference: isolated workloads, short-lived processing, audited software images, minimal retention. That middle tier lets Apple use server resources without immediately shipping raw user context to a third party.
Then there’s Gemini.
That top tier is for the messy requests: long-context reasoning, multimodal prompts, planning across apps, generating structured outputs from large user contexts, maybe handling an image plus a natural-language follow-up plus a tool invocation in one flow. Gemini is well suited to that, and Apple knows it.
A likely routing path looks like this:
- Siri captures audio and runs
ASRon device. - A router scores complexity, privacy sensitivity, and confidence.
- Apple builds a constrained prompt with only the context needed.
- If tools are needed, the model returns structured output, probably
JSONor a function-call style schema. - Siri executes via
App Intents,SiriKit, or app-specific actions. - Policy checks verify the result before the user hears or sees it.
That’s how you keep an assistant useful without giving a probabilistic model broad permission to do whatever it wants.
Why Gemini fits
Gemini brings three things Apple needs.
First, long context. A useful assistant has to reason across prior interactions, app state, calendar data, notes metadata, recent photos, and whatever else a request pulls in. Small local models tend to struggle once the job turns into a sequence instead of a single intent.
Second, multimodality. Siri needs to handle voice, text, images, and app context in the same interaction. “Find the photo where I’m skiing with Alice, draft a message to her, and suggest times based on my Friday calendar” is the kind of request people actually try. Simpler systems break on that fast.
Third, tool use. Assistants live or die on execution. If Gemini can produce structured outputs that map cleanly to app actions, it helps. If it mostly sounds fluent, that’s not enough.
Apple will probably keep free-form generation on a short leash. Anywhere hallucination gets expensive, Siri should favor deterministic tool calls, app intents, grounding against approved user data, and strict policy filters. That’s the right instinct. Nobody wants an eloquent assistant that booked the wrong flight.
The trade-offs
Apple gets capability and speed, but it also takes on a dependency on Google at the same moment big platform companies are trying to reduce exactly those dependencies.
There’s a branding problem too. Apple has spent years selling the idea that it builds the whole stack. Now one of the most important parts of its assistant experience may come from its biggest mobile rival. Users probably won’t care much if Siri gets better. Apple will.
Privacy is the harder issue.
Apple can minimize prompts, redact PII, constrain retrieval, and keep the most sensitive requests off third-party infrastructure. It should. But the more useful Siri gets, the stronger the temptation to pass richer context upstream. Long-context assistants work better when they know more. That’s also where the risk starts climbing. The engineering problem here isn’t only model quality. It’s context discipline.
Latency is another problem Apple can’t hand-wave away. On-device flows can feel instant, or close enough. Cloud calls can’t. If the Gemini path regularly pushes response times past 600 ms, Apple will need aggressive streaming and solid turn-taking design so Siri still feels responsive while inference is happening. Voice UI is less forgiving than text chat.
Then there’s reliability. Hybrid routing adds failure modes: model timeouts, policy fallback, cloud unavailability, schema mismatch, tool execution errors. A polished assistant needs graceful degradation across all of that. Siri hasn’t exactly built up trust here.
A big win for Google
For Google, this is a major distribution win. Gemini gets placement inside one of the world’s biggest consumer computing platforms.
That matters more than benchmark chatter. Distribution is the fight now. The assistant layer sits in front of apps, messages, media, search, and personal context. Whoever powers that layer gets a lot of influence over where users go next and which workflows become normal.
It also puts pressure on OpenAI and Anthropic. Apple reportedly tested both. Google won this round. That doesn’t mean it keeps the slot forever. The deal is non-exclusive, which is smart for Apple and probably necessary given Google’s antitrust constraints. Regulators are already watching default deals closely, and any large Apple-Google agreement now lands in a much touchier legal climate than the old search arrangement did.
What iOS developers should do
If Siri gets materially better, apps with clean action surfaces will benefit. Apps without them will start to look dated.
The immediate priority is App Intents. If your app still treats assistant access as a side feature, fix that. Models work best when your action surface is structured, typed, disambiguated, and predictable.
A few things matter:
- Define clear intent schemas with explicit parameters and return values.
- Make actions idempotent where possible.
- Support confirmation for destructive operations.
- Return structured data that a planner can use, not blobs of text.
- Handle partial context and follow-up clarification cleanly.
If you run server-side tools behind those intents, think like an API engineer. Short-lived auth. Strong validation. Tight rate limits. Full auditability. Assistant traffic produces weird edge cases because users phrase things loosely and models sometimes fill in blanks you didn’t want filled in.
Observability matters more than most teams expect. If Siri starts sending real usage into your app, you’ll want to know:
- Which flows came from assistant invocation
- Where latency accumulates
- Which intents fail validation
- When fallbacks fire
- Whether cloud-routed requests correlate with higher abandon rates
Use MetricKit, trace assistant-triggered paths, and separate tool execution metrics from normal in-app usage. Otherwise you’ll spend months guessing.
Also build for degraded operation. If the top-tier model is unavailable, core actions should still work. An assistant that can’t complete basic tasks because the fancy reasoning tier is down is a bad product.
This is the assistant stack that usually ships
There’s a broader lesson here. The clean keynote version of AI is a unified assistant powered by one coherent intelligence. The version that actually ships is a layered system: local models for speed and privacy, private compute for controlled expansion, frontier APIs for capability, tool calls for reliability, and a lot of policy code in between.
That stack is harder to market. It’s also how serious systems get built.
If Apple executes, Siri could finally get better at the parts that matter: app actions, cross-app planning, multimodal requests, and context-heavy tasks that used to collapse into canned responses. If it doesn’t, the problem won’t be that Gemini was weak. It’ll be routing, latency, tool reliability, and trust.
That’s the work now.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Compare models against real workflow needs before wiring them into production systems.
How model-backed retrieval reduced internal document search time by 62%.
Google’s Darren Mowry, who oversees startups across Google Cloud, DeepMind, and Alphabet, had a straightforward message for AI founders: if your company is basically a UI on top of someone else’s model, or a switchboard routing prompts between models...
Apple is reportedly working on CarPlay support for third-party AI chatbots, including ChatGPT, Gemini, and Claude. If that happens, Siri stops being the only serious voice layer in Apple’s in-car system. That matters. CarPlay has always been tightly ...
Google left Cloud Next with its usual stack of AI announcements, but a few stand out for people who actually have to ship things. The headline model is Gemini 2.5 Pro Experimental, which Google calls its strongest reasoning model so far. More interes...