What is the rumored context window size for Gemini Ultra?

Up to a 256k-token context window for document-heavy and multimodal use cases.

What pricing changes are coming to the Gemini API?

Google will add Premium Plus and Premium Pro tiers on top of the existing $20/month Gemini Advanced plan for enterprise service levels.

What are Astra and Project Mariner?

Astra is a platform for custom, stateful AI agents, while Project Mariner enables autonomous web actions and long-running AI tasks.

Generative AI May 11, 2025

Google I/O 2025 preview: Gemini APIs, agent tooling, and Android 16

Google I/O 2025 runs May 20 to 21 at Shoreline Amphitheatre, and the message already looks pretty clear: Google wants developers buying into the full stack, from agent tooling and model APIs to Android UX. The headline items are familiar enough. Gemi...

Google I/O 2025 looks like a push to sell the whole AI stack

The headline items are familiar enough. Gemini gets a higher-end tier. Android 16 gets a new design language and platform updates. Projects like Astra and Mariner keep moving toward actual products. The interesting part is how closely these pieces line up. Google seems focused on making the stack feel connected, not just showing a pile of separate demos.

That matters if you build AI products for a living. Model quality is only one part of the buying decision now. Latency, context limits, workflow orchestration, mobile integration, and permission boundaries matter just as much. Google knows that.

Gemini is getting more stratified

The biggest AI update expected at I/O is Gemini Ultra, positioned as Google’s flagship model for low-latency multimodal work across text, code, and images. Google is also expected to add Premium Plus and Premium Pro subscription tiers on top of the current $20/month Gemini Advanced plan.

That looks like pricing cleanup. It’s also how cloud AI platforms package capability into something enterprises can actually buy. A single “best model” works for headlines. It’s less useful when you’re trying to plan production workloads.

If Google is offering higher throughput, longer context windows, and collaboration features by tier, then the pitch is about service levels as much as raw model quality. That’s useful for engineering teams. You can map workloads to budget and latency targets instead of routing everything to the biggest model and hoping nobody notices the bill.

The rumored 256k-token context window is the kind of spec vendors love to overplay, but it does matter in a few real cases:

document-heavy copilots
codebase-aware assistants
multimodal review systems that combine images and text
long-running agent sessions that need persistent state without constant summarization

The downside is obvious. Bigger context windows raise costs and encourage lazy prompt design. If your system only works because you dumped a quarter-million tokens into the prompt, the architecture probably needs work in retrieval, memory management, or task decomposition.

Google’s case gets stronger if Ultra is materially faster on mixed-input requests. That’s where multimodal systems often get awkward in practice. Benchmarks are easy. Processing an image, reasoning over text, generating structured output, and returning inside a tolerable latency window is harder.

The implementation pattern is straightforward enough:

from google.gemini import GeminiClient

client = GeminiClient(api_key="YOUR_KEY")
tier = "ultra" if high_complexity else "advanced"
response = client.chat(
model=tier,
prompt="Analyze this floor plan image and generate inspection report."
)
print(response.text)

The code is simple. The production decision behind it isn’t. Teams will need routing logic based on input type, latency SLOs, request complexity, and margin. The vendors that make that easier have an edge. Google wants to be one of them.

Astra and Mariner are where it gets riskier

Gemini upgrades are expected. The bigger shift may be Google’s continued push around Astra and Project Mariner.

Astra is described as a platform for custom AI agents that can maintain state, process text, audio, and vision in real time, and handle long-running tasks. Mariner goes further into autonomous web action: form filling, scraping, and cross-site navigation based on user intent.

That’s ambitious. It’s also where the safety and reliability problems get messy.

Developers have wanted better agent primitives for a while. Not notebook demos with a few bolted-on tools, but workflow engines that can persist context, recover from failure, and operate across interfaces. If Astra ships with usable state management and solid multimodal coordination, it could save teams from rebuilding the same orchestration layer again and again.

Mariner is harder to trust. Autonomous web agents sound great until they touch brittle, regulated, rate-limited, or sensitive systems. That covers a lot of the useful web.

There’s real value here. An agent that can gather data from scattered web sources, normalize it, and hand it into a downstream pipeline would remove a lot of manual glue work. Sales ops, support, and internal analytics teams could get value fast. Low-code flow builders would make that even more tempting for non-engineers.

But web agents are only as trustworthy as their permission model, site compliance behavior, and failure handling. A system that can traverse pages and submit forms needs strong sandboxing, explicit consent boundaries, and audit logs. Otherwise you get automation that looks productive right up until it leaks data or trips abuse protections.

Google’s problem here is technical and political. The web already has a strained relationship with automated access. If Mariner behaves like a policy-aware assistant, it has a chance. If it feels like industrialized scraping with a friendly interface, expect pushback.

Android 16 is part of the AI story

On the Android side, I/O is expected to center on Android 16 and Material 3 Expressive, along with updates like Auracast, lock screen widgets, accessibility improvements, and developer sessions around Android XR and Wear OS.

Material 3 Expressive could get dismissed as a design refresh. That would miss the point. Design systems shape what product teams actually ship. If Google is adding new motion patterns, dynamic color behavior, and more responsive action elements, it’s trying to make Android interfaces feel more adaptive. That fits neatly with the broader AI push.

For teams using Jetpack Compose, the practical effect is pretty clear. New Compose APIs tied to Material 3 Expressive will likely become the preferred path for apps that want first-party transitions and interaction patterns. If you maintain a large Android codebase, that usually means two things:

some design debt gets easier to pay down
some UI assumptions are about to age badly

Auracast support is less flashy and probably more useful. Bluetooth broadcast audio has obvious applications in classrooms, tours, accessibility tools, and shared media environments. It sounds niche until you work on a product where assistive audio or shared listening is the whole use case.

Lock screen widgets matter too, especially if Google wants AI-powered summaries, reminders, or context-aware status surfaces to become part of daily phone use. The lock screen is an obvious place for that. But it only works if battery use, notification fatigue, and privacy controls stay under control. Android has a habit of turning useful surface area into clutter.

The accessibility updates deserve attention. Voice Access improvements and typography controls for low-vision users aren’t side notes. They affect whether AI-assisted mobile experiences are actually usable or just easy to demo.

A workflow, not a feature list

The cleanest read on these announcements is as an end-to-end stack:

Mariner gathers or acts on web data
Astra coordinates agent logic and long-running tasks
Gemini Ultra handles multimodal inference
Android 16 becomes the user-facing runtime surface

That’s a stronger story than a generic bundle of new AI features. It puts Google closer to a full operating environment for AI applications. OpenAI has model mindshare. Microsoft has enterprise distribution. Meta has open-weight pull. Google’s advantage is that it controls large parts of the delivery chain: model infrastructure, browser access, mobile OS, productivity apps, and cloud tooling.

Whether it can make those parts feel coherent is still an open question.

Google has plenty of capable technology. Product consistency has been the weaker spot. Developers will care less about flashy demos than about stable APIs, predictable quotas, sane SDKs, and whether the platform strategy survives longer than a keynote cycle.

The trade-offs are easy to spot

A few technical issues sit underneath all of this.

Scaling multimodal inference is expensive. If Ultra is truly low-latency at high volume, Google has to make the economics work across GPU and TPU fleets. For customers, the issue is simpler: multimodal requests get expensive fast, and batching or fallback strategies become mandatory.

Agent reliability is still weak across the industry. Long-running workflows fail in annoying, non-deterministic ways. State drifts. Web UIs change. Permissions expire. If Astra and Mariner don’t ship with strong observability and recovery hooks, teams will end up babysitting “autonomous” systems.

Privacy gets harder when agents can see and do more. A model answering a question is one thing. An agent that traverses sites, reads documents, and submits actions creates a much larger blast radius.

Android UI fragmentation also hasn’t gone anywhere. Material 3 Expressive may look great in Google’s own demos, but OEM customization still complicates design consistency and rollout timing. That’s the usual Android tax.

What to watch at I/O

For developers, the most important details probably won’t be the headline names. Watch the specifics:

context limits, rate limits, and latency numbers for Gemini tiers
pricing that makes model routing practical
whether Astra exposes real workflow controls or polished demos
what permission and audit model Mariner uses
how much of Material 3 Expressive is immediately usable in Compose
whether Android XR has substance beyond preview-session enthusiasm

If Google gets those details right, I/O 2025 could mark a shift from a company with a lot of AI products to one with a platform people can actually build on. If not, this will be another year of promising parts that still need too much stitching by the teams paying for them.

That’s the bar now: operational usefulness.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof

AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

OpenAI o3 and o4-mini shift from reasoning models to tool-using agents

OpenAI’s latest model release matters because o3 and o4-mini look better at doing work, not just describing how they’d do it. The headline is tool use. These models can call Python, browse, inspect files, work through codebases, and handle images whi...

Google adds Deep Research to Gemini 2.5 Pro for Gemini Advanced

Google has added Deep Research to Gemini 2.5 Pro for Gemini Advanced subscribers, and this version of the “AI agent” pitch is at least tied to a real task. The feature is live now for paid users in about 150 countries and 45 languages. You give Gemin...

Google Gemini adds native image generation and editing in chat

Google has added native image creation and editing to Gemini chat. You can upload a photo, generate a new image, and keep refining it through follow-up prompts. Change the background. Recolor an object. Add or remove elements. Keep working on the sam...