Generative AI December 25, 2025

Waymo tests Gemini as a rider assistant in robotaxis, not the driving stack

Waymo is testing Google’s Gemini as an in-car assistant for robotaxi riders, according to app code uncovered by researcher Jane Manchun Wong. The important part is the boundary. Gemini can answer questions, adjust some cabin settings, and help reassu...

Waymo tests Gemini as a rider assistant in robotaxis, not the driving stack

Waymo’s Gemini pilot shows what an LLM in a robotaxi should look like

Waymo is testing Google’s Gemini as an in-car assistant for robotaxi riders, according to app code uncovered by researcher Jane Manchun Wong. The important part is the boundary. Gemini can answer questions, adjust some cabin settings, and help reassure riders. It’s kept away from driving, routing, and live commentary about what the vehicle is doing.

That line matters.

A lot of companies still bolt an LLM onto a product, give it a voice, connect a few APIs, and hope the behavior holds up. Waymo appears to be taking the harder, better route. The leaked prompt, reportedly more than 1,200 lines, reads like policy wrapped around a chat interface. That’s how you build this kind of system when real passengers are involved.

Useful because it stays in its lane

The Gemini assistant appears to handle a narrow set of rider-facing tasks:

  • answer general questions
  • adjust climate, lighting, and music
  • greet riders with pre-approved language
  • use lightweight rider context, including first name and trip history, to personalize replies

The list of things it won’t do is just as telling.

It won’t change the route. It won’t control seats, windows, or volume. It won’t comment on what the car is doing in the moment. It won’t speculate about safety events. If a rider asks how the car perceives the road, the assistant points back to the Waymo Driver and its sensor stack instead of pretending it has direct awareness.

That sounds restrictive. It’s good product design.

In a normal car, a human driver handles awkward questions and nervous passengers. In a robotaxi, there’s nobody up front to do that. So the assistant becomes part of the ride experience. Waymo is clearly trying to keep riders from confusing that layer with the actual driving system.

That matters for safety. It matters for liability too.

Probably a standard tool-calling setup, with stricter controls

Waymo hasn’t published the architecture, but the behavior described in the prompt points to a familiar pattern.

A rider says something through the in-car interface. The model classifies the intent. If the request maps to an approved action, the system calls a bounded tool. A service validates the parameters and executes the request. Then Gemini sends back a short acknowledgment.

Something like this:

{
"intent": "adjust_climate",
"params": { "target_temp": 72, "mode": "auto" },
"policy_check": { "allowed": true, "min": 60, "max": 80 },
"execute": "climate_service.adjust",
"ack": "Setting the cabin to 72 degrees."
}

The interesting part isn’t the function call itself. Every major LLM stack can do that now. The interesting part is the hard split between conversation and state-changing actions.

If you’re building for any safety-sensitive setting, prompt rules alone won’t cut it. The model can propose a tool call. It should not decide policy. Real controls need an allowlist, parameter bounds, audit logs, rate limits, and service-layer validation that treats model output as untrusted input.

Waymo seems to get that.

The 1,200-line prompt says something real

A giant system prompt can be a bad sign. Plenty of teams dump style rules, refusal policies, edge cases, and safety language into one huge blob and call it engineering.

Here, the length probably reflects the job. A rider assistant in a robotaxi has to do several things at once:

  • maintain a stable identity
  • avoid claiming driving capabilities
  • keep answers short
  • refuse unsafe or out-of-scope requests
  • use rider context without getting creepy
  • redirect sensitive questions consistently
  • stay calm when the rider is stressed or confused

That’s a lot of behavioral scaffolding. For a public deployment, 1,200 lines doesn’t sound excessive.

Still, prompts are brittle when they’re carrying too much weight. The more behavior you encode only in natural-language instructions, the more likely it breaks under weird phrasing, adversarial input, or prompt injection. The safer setup is layered: prompt guidance, classifier checks, tool gating, and deterministic policy enforcement after the model responds.

A car cabin is also a rough place for voice AI. Road noise, overlapping speech, kids yelling, passengers joking around, maybe someone trying to bait the system for a TikTok clip. That pushes up the bar for intent verification and abuse handling fast.

Latency matters a lot in a moving car

People will forgive a slow chatbot on a website. They won’t forgive it in a vehicle.

If you ask to lower the temperature or explain why the car stopped, the response has to feel immediate. For simple informational replies, something under roughly 300 ms feels quick. Tool-backed actions can stretch a bit, maybe 700 ms, but not much before riders start repeating themselves or assume it failed.

That affects the design:

  • some intent classification probably needs to happen close to the edge
  • frequent requests should be cached
  • token streaming helps mask latency
  • offline or degraded modes matter if connectivity drops

Fallback behavior matters almost as much as the main path. If Gemini can’t reach the cloud, the assistant shouldn’t fake competence. It should drop cleanly to a smaller feature set, maybe local controls and canned help flows, and say so plainly.

Obvious, yes. Still easy to get wrong.

Waymo is treating trust like an engineering constraint

The prompt’s tone is notably conservative. The assistant gives short answers. It doesn’t speculate. It doesn’t narrate incidents. It stays away from emergency handling and service tasks like ordering food.

That can feel narrow, but it’s probably the right call for an early deployment. Conversational AI is fun right up until a rider asks whether the car is safe after a hard brake, why it swerved, or whether a sensor failed. One hallucinated answer there can turn a support problem into a regulatory problem.

Waymo already uses Gemini’s broader world knowledge to help train for rare driving scenarios. Putting the model in front of riders is a separate problem with different failure modes. Training support and live interaction should not be treated as the same thing, and Waymo doesn’t appear to be making that mistake.

Tesla’s Grok integration points in a different direction. More conversational, more personality, looser framing. That may suit Tesla’s brand. It also leaves more room for confusion inside a vehicle, where the line between companion and control surface can blur quickly.

Waymo’s version is less flashy. It’s probably the better product call.

What developers should take from it

If you’re building an LLM assistant for a regulated app, a vehicle, a medical workflow, an industrial UI, or anything near physical systems, this pilot offers a solid blueprint.

Keep the tool surface boring

Start with low-risk actions. Cabin temperature, music, lights. Fine. Route changes, emergency workflows, anything actuator-heavy, anything that could affect safety or create a false sense of control should stay out until the guardrails are proven.

Separate identity from critical systems

The assistant should have a clear name and a clear role. If users confuse the chat layer with the autonomy stack, you have a problem already. Distinct labels, distinct permissions, distinct logs.

Treat model output as hostile until validated

Every function_call needs policy checks after the model emits it. Bounds checking, auth, cooldowns, state awareness, rollback paths. A friendly natural-language layer should not be allowed to smuggle unsafe state changes into production.

Design for abuse, not just normal use

Prompt injection is the obvious issue. Audio attacks get less attention and deserve more. You need voice activity detection, secondary intent checks, replay resistance where possible, and throttling for repeated commands. “Set the cabin to 85” fifty times in a minute is annoying for riders and useful as a systems test.

Watch privacy drift

Using first names and trip history can make the interaction smoother. It can also get invasive quickly. Data retention, redaction, consent, and in-car notice screens matter here. Raw audio storage should be minimal, if it exists at all.

The bigger shift

The point of this pilot isn’t that riders can ask Gemini to change the cabin temperature. Any decent automation stack can do that.

What matters is where these models are starting to show up: as tightly constrained interface layers in places where software used to be rigid, scripted, or missing altogether. That’s a believable near-term use case. One general model running an entire safety-critical product is not.

Waymo’s setup, at least from the prompt details that have surfaced, treats the model as a conversational front end for a narrow, policy-controlled system. Less magic. Far more deployable.

In robotaxis, deployable wins.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service
AI agents development

Design controlled AI systems that reason over tools, environments, and operational constraints.

Related proof
Field service mobile platform

How field workflows improved throughput and dispatch coordination.

Related article
Google Maps adds Gemini for conversational navigation and hands-free search

Google is pushing Gemini deeper into Maps, and this one looks useful. The update adds conversational help while driving, landmark-based directions, proactive traffic alerts, and a Lens-powered visual Q&A mode for places around you. That puts Maps in ...

Related article
Linq raises $20M to build AI assistant infrastructure for messaging apps

Linq has raised a $20 million Series A to build infrastructure for AI assistants that operate inside messaging apps instead of standalone products. The round was led by TQ Ventures, with Mucker Capital and angel investors participating. The funding m...

Related article
Continua raises $8M to put AI agents in SMS, iMessage, and Discord groups

Continua, a startup founded by former Google distinguished engineer David Petrou, has raised $8 million from GV and Bessemer to put AI agents inside group chats. The idea is straightforward: join an SMS thread, iMessage group, or Discord server and h...