What is token billing in Stripe Billing?

It tracks model token usage, calculates provider costs, applies your markup, and creates metered charges in Stripe.

Can I use Stripe token billing with third-party AI gateways?

Yes. It works with Stripe’s AI gateway and third-party gateways like Vercel and OpenRouter.

How do I configure my markup margin in token billing?

You define a cost-plus markup percentage in your billing settings, which Stripe applies to the calculated model cost.

Llm March 3, 2026

Stripe Billing adds token billing for AI products with usage-based pricing

Stripe has a new preview feature in Stripe Billing called token billing. The pitch is simple: if your product runs on LLMs, Stripe wants model usage to show up as a clean revenue line instead of a pricing mess. A lot of AI products still have a basic...

Stripe is trying to make AI billing sane

Stripe has a new preview feature in Stripe Billing called token billing. The pitch is simple: if your product runs on LLMs, Stripe wants model usage to show up as a clean revenue line instead of a pricing mess.

A lot of AI products still have a basic business problem. They sell a tidy monthly plan while paying variable, provider-specific token costs underneath. If usage jumps, or traffic shifts from a cheaper model to a better one, margins can disappear fast. Agent workflows make that worse because one user action can fan out into multiple model calls, retrieval steps, and tool runs.

Stripe’s answer is cost-plus billing for AI. Track the model cost, apply a configured margin, and invoice the customer from there.

It’s a practical feature. It’s also a sign that flat-rate AI pricing is getting harder to defend once products hit real usage.

What Stripe is shipping

In the preview, Stripe Billing can ingest token usage, map it to model pricing, calculate the underlying cost, and apply a markup or margin before generating billable line items.

The setup works with Stripe’s own AI gateway, but Stripe says it also supports third-party gateways such as Vercel and OpenRouter. That matters. Most teams don’t want billing tied to one inference path, and multi-model routing is already normal.

The flow is straightforward:

Your app or gateway records a model call with token counts and metadata.
Stripe matches the model_id to a pricing catalog.
It computes raw cost based on provider rates.
It adds your configured markup.
The charge lands in Stripe Billing as a metered item.

If you want to hold a steady 30% margin on top of model cost, this is the sort of thing it’s for.

Stripe says its own AI gateway does not add a markup today. That makes the billing piece more interesting. Stripe is effectively saying you can route however you want and let Stripe handle pricing logic and invoicing.

That puts it in direct competition with gateway vendors that already make money by reselling model access with a fee baked in. OpenRouter, for example, already offers pass-through pricing with its own markup tier and budget controls. Stripe is moving into the same territory from the billing side instead of the routing side.

Why engineering teams should care

For developers, this lands upstream of finance.

Once you support multiple providers, dynamic routing, or premium fallback models, pricing stops being a simple product decision and turns into a systems problem. You need a reliable chain from request to usage event to billable item. If that chain is sloppy, you either undercharge customers or eat the margin yourself.

That’s happened a lot.

Plenty of AI apps launched with broad “unlimited” plans because it made the signup page easier. Then users started hammering long-context models, agents got chatty, and token spend blew past the original assumptions. Suddenly the clean SaaS plan had model COGS swinging every day.

Token billing is Stripe acknowledging a reality most teams already know: AI software often behaves more like cloud infrastructure than seat-based SaaS. Usage drives cost. Pricing has to reflect that somewhere.

Customers won’t all love it. Nobody enjoys staring at token charges. But many enterprise buyers would rather see a transparent usage line than get trapped in a fuzzy plan that gets repriced six months later because the vendor guessed wrong.

The technical part that gets messy fast

On paper, token billing is simple. Count tokens, multiply by price, send the invoice.

The mess shows up in at least three places.

Pricing catalogs change constantly

Providers cut prices, launch new models, retire old ones, split prompt and completion pricing, or bill multimodal requests with different unit rules. Some charge separately for input and output tokens. Some bill image generation differently. Some use model names that don’t map cleanly across gateways.

So Stripe needs a versioned pricing catalog that’s stable enough for audits and flexible enough to track upstream changes. If a provider updates pricing at noon, requests from 11:58 a.m. still need deterministic billing. That’s accounting work, not just metering.

Teams integrating this should cache catalog data locally and keep a clear record of which price version applied to each event.

Usage ingestion needs strong IDs and clean semantics

If you’re feeding usage into Stripe, idempotency matters. You need request_id, customer_id, a canonical model_id, timestamps, and ideally separate input_tokens and output_tokens if billing differs between them.

That’s just the baseline. In multi-tenant plans, you may also need subscription_item_id, feature flags, regional pricing context, and a signed event ID so you can prove the data wasn’t tampered with.

Per-request events also get noisy fast. A busy AI product can emit a huge volume of usage records, especially if agents are chaining calls behind the scenes. Batching into short aggregation windows helps, but there’s a trade-off. You cut event pressure and Stripe API traffic, then lose some real-time visibility and make debugging harder.

Markup and margin get confused all the time

Finance teams know the difference. Product teams often don’t.

A 30% markup on cost is not the same as a 30% margin on revenue. If your billing settings get that wrong, your unit economics model is fantasy. It sounds like a small detail until it ships to production because everyone assumed the same number meant the same thing.

Stripe can automate the math. It can’t fix fuzzy internal definitions.

What Stripe gets right

Stripe isn’t inventing some new theory of AI pricing here. It’s giving teams a way to stop pretending their costs are stable when they clearly aren’t.

That helps in a few obvious ways:

Multi-model apps are easier to monetize. If traffic shifts between Anthropic, OpenAI, or whatever routing layer sits in front, billing can still track actual cost.
Finance gets cleaner visibility. You can reconcile provider spend against customer revenue instead of relying on blended averages and guesswork.
Developers write less billing glue. A lot of teams have built brittle homegrown token meters because nothing off the shelf handled AI usage well.

There’s also a competitive angle. If Stripe becomes the default place where AI usage turns into invoices, it gains another sticky layer in the stack. Payments was already sticky. Billing is even stickier. Now it wants AI cost accounting too.

That’s smart business. It also says something about where the AI stack is going. More of it is ending up inside infrastructure products.

The limits are real

This won’t solve AI pricing on its own.

First, customers don’t think in tokens. They think in tasks, reports, chats, agents, seats, documents, whatever the product actually does. A pricing page that exposes token economics too directly starts to feel like AWS, and most application buyers don’t want that.

So token billing probably works best behind the scenes or as a detailed overage layer, not as the only thing customers see. Good products will still translate token consumption into something legible in the UI.

Second, token counts are only part of the cost. Caching, embeddings, vector database traffic, tool execution, browser automation, and long-running agent loops all add spend outside the narrow input-plus-output-token bucket. If your product has heavy non-LLM costs, token billing can be precise and still miss a lot of the actual COGS picture.

Third, trust matters. Once usage-based billing touches invoices, every duplicate event, catalog mismatch, or delayed pipeline turns into a billing dispute. Engineers usually treat metering as plumbing right up until finance asks why provider invoices don’t line up with revenue.

That’s why this feature sits somewhere between observability pipeline, accounting system, and fraud surface.

What teams should watch before adopting it

If you’re testing the preview, a few implementation details matter more than the announcement suggests.

Keep prompt data out of billing events unless you truly need it. Token logs can expose sensitive text, and billing pipelines are a bad place to discover you stored too much.

Use signed, idempotent events. If a compromised client or buggy worker can replay usage, you’ve built an invoice inflation bug.

Add customer budgets and spend alerts early. Transparent billing helps. Surprise bills still cause damage.

And think hard about where the meter lives. If usage is reported by the client, the data will be garbage. If it comes from your application tier, you still need reconciliation against the actual provider response. The cleanest source is usually the gateway or server-side inference layer that sees both the request and the provider’s usage numbers.

Stripe is right about the direction. AI products need billing infrastructure that understands variable model costs and can preserve margins without forcing every team to become a small cloud vendor. The hard part is obvious too. Once billing sits this close to model usage, mistakes end up on invoices.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Ecommerce AI development

Improve discovery, catalog quality, support, forecasting, pricing, and merchandising workflows.

Related proof

Catalog enrichment automation

How catalog automation reduced product data cleanup work by 58%.

Inception raises $50M to test diffusion models against autoregressive code LLMs

Autoregressive language models still dominate. Inception is betting that for some coding workloads, they’re also the bottleneck. The startup, led by Stanford professor Stefano Ermon, has raised $50 million in seed funding from Menlo Ventures, with Ma...

Elloe AI adds a verification layer for LLM safety and inspectable decisions

Elloe AI has a clear pitch: put a safety and verification layer between the model and the user, and make the system's decisions inspectable. That may sound like familiar guardrails territory, but Elloe is aiming at a specific spot in the stack. The c...

arXiv sets one-year ban for papers with unchecked LLM-generated text

arXiv is tightening its rules for AI-generated research submissions. Authors who submit papers with clear evidence of unchecked large language model output can be banned from submitting for a year. That is a serious penalty for a preprint server. arX...