What are the two types of weekly limits on Claude Code?

Claude Code now has a weekly overall token limit across all models and a separate weekly token cap for premium models like Sonnet 4 and Opus 4.

How many hours of Sonnet 4 can I use per week?

Pro plan users get about 40–80 hours, while Max plan users receive approximately 240–480 hours of Sonnet 4 usage per week.

Why did Anthropic choose token-based limits?

Tokens closely track compute cost and resource usage, making them a more accurate and abuse-resistant measure than elapsed time.

Llm August 2, 2025

Anthropic adds weekly rate limits to Claude Code, changing the math for power users

Anthropic has added weekly rate limits to Claude Code on top of the existing five-hour caps, and for heavy users that changes the product in a meaningful way. The new setup has two quota buckets: - a weekly overall usage limit across models - a model...

Anthropic puts weekly caps on Claude Code, and power users are feeling it

Anthropic has added weekly rate limits to Claude Code on top of the existing five-hour caps, and for heavy users that changes the product in a meaningful way.

The new setup has two quota buckets:

a weekly overall usage limit across models
a model-specific weekly limit for Anthropic’s most expensive models, including Sonnet 4 and Opus 4

These limits apply across Claude Code’s paid tiers, including the $20 Pro plan, the $100 tier, and the $200 Max plan. Anthropic says a typical Pro subscriber gets about 40 to 80 hours of Sonnet 4 usage per week before hitting the cap. Max users get much more room, around 240 to 480 hours on Sonnet 4 and 24 to 40 hours on Opus 4.

Those numbers sound roomy until you look at how people actually use Claude Code. Serious users run long sessions, feed in large codebases, keep agents busy across multiple tasks, and in some cases let it run all day. Anthropic says a small group was using Claude Code 24/7 and even reselling access. After seven major or partial outages in a month, the company seems to have decided the flat-rate model was getting abused past the point it could tolerate.

Why Anthropic chose tokens

The important detail is that these weekly limits are token-based, not purely time-based.

That makes sense from Anthropic’s side. Time is a rough proxy for cost. Tokens are much closer to the thing that actually consumes compute. A short prompt with a short answer is cheap. A giant repo dump plus generated code, retries, summaries, and tool calls is expensive.

For transformer models, token volume tracks inference load far better than elapsed time. More tokens usually means more attention work, more memory pressure, more GPU time, and longer queue occupancy. If your goal is to stop a small number of users from soaking up shared capacity, token accounting is the cleaner tool.

It also blocks easy workarounds. Users can split tasks across sessions or run multiple things at once, but the token count still reflects the total load. Internally, enforcement probably looks a lot like a standard admission-control check before a request hits the queue:

if user.weekly_tokens + request.tokens > weekly_limit:
reject request
else:
queue request

That part is straightforward. The second cap matters more.

The model caps show where the costs are

Anthropic isn’t only limiting overall usage. It’s also separately capping Sonnet 4 and Opus 4. That says a lot.

Premium coding models are where the economics get ugly. They’re the ones people want for real refactors, large-context repo analysis, and agentic workflows that span many files. They also carry the heaviest per-user compute load. So Anthropic is putting guardrails around the expensive path while still leaving cheaper models to handle routine work.

It’s a practical move. It also weakens the original sales pitch for these tools. Flat monthly pricing works when usage is light and fairly predictable. It breaks down once customers start treating the product like rented compute.

The broader shift across AI coding is hard to miss. Vendors are moving away from soft “all you can eat” subscriptions and toward something that looks a lot more like cloud metering with a nicer interface. Anthropic is just saying it out loud.

Why now

The immediate answer is capacity pressure.

Anthropic has had repeated outages, and Claude Code usage appears to be growing faster than the company can serve peak demand. That’s not surprising. Inference capacity is still tight, especially for high-end models with long context windows and sustained interactive use. You can add GPUs, expand clusters, tune schedulers, and squeeze more from the serving stack, but none of that happens quickly.

Coding assistants also create a nasty workload pattern. Developers don’t use them like casual chatbots. They hit them hard during working hours, often with long prompts, code diffs, repo context, and repeated iterations. Add agent loops and background tasks and you get cloud-scale demand from a desktop IDE.

Anthropic’s problem probably isn’t average demand. It’s the tail. A small fraction of customers can consume a wildly disproportionate share of the pool.

And if the company’s claim about account resale is accurate, the old pricing model was subsidizing unauthorized mini-platforms.

What changes for developers and teams

For casual users, probably not much. For anyone using Claude Code as part of the daily development loop, this matters right away.

Long-running agent sessions get riskier

If your workflow depends on long coding sessions that inspect lots of files, revise output repeatedly, and keep context alive for hours, weekly caps become something you have to manage. You can burn through quota faster than the advertised “hours” suggest if your prompts are large or the model is verbose.

Shared team accounts get even worse

They were already a bad idea for security and compliance. Now they’re a bad operational choice too. One person’s heavy session can eat quota for everyone else, and if Anthropic is actively looking for abuse patterns, shared credentials are asking for trouble.

CI and automation need a fallback

If you’ve wired Claude Code or Anthropic models into internal tooling, code review assistants, doc generation, or migration workflows, you need graceful degradation. A hard stop from quota exhaustion is annoying in chat. In a delivery pipeline, it can become a blocker.

Budgeting gets less fuzzy

The upside is that token-based limits force teams to treat consumption as a real resource instead of a vague subscription perk. That’s irritating, but useful. It pushes teams to separate high-value model use from AI busywork.

The response is mostly boring engineering

Teams that rely on Claude Code should treat this like any other constrained dependency.

First, add visibility. If Anthropic exposes usage dashboards or APIs, feed that data into your own telemetry. If not, log request sizes and model selection on your side. You need to know which workflows are burning premium tokens and whether they’re worth it.

Second, route work by model class. Don’t spend Sonnet 4 or Opus 4 on everything.

Use the premium path for:

repo-wide reasoning
architectural changes
large refactors
debugging ugly, stateful failures

Push lighter tasks elsewhere:

boilerplate generation
unit test scaffolding
straightforward edits
repetitive transforms

Third, cache where it makes sense. Teams often pay repeatedly for the same summaries, code explanations, style-guide transforms, and migration hints. If the input hasn’t changed, store the output.

Fourth, build a fallback path. If Claude Code is embedded in internal systems, support another model provider, or at least a local option for lower-value tasks. That doesn’t mean every team should sprint toward self-hosting. It does mean single-vendor assumptions are getting harder to justify.

This is also a pricing story

Anthropic’s new limits expose a problem that has been hanging over AI coding products for a while. Users want simple subscriptions. Providers are dealing with infrastructure costs that don’t fit simple subscriptions very well.

The rough pattern across the market looks like this:

low-friction plans to drive adoption
hidden or soft usage ceilings
premium tiers with better throughput
eventual movement toward metered or hybrid pricing

That pattern goes beyond Anthropic. AI coding vendors are running into the same math cloud providers learned years ago. If one customer can consume 100x the resources of another while paying the same sticker price, the pricing model won’t hold.

For buyers, “unlimited” is now mostly a marketing term unless it comes with a formal SLA and a very expensive contract.

Second-order effects

This should push teams to get stricter about prompt efficiency and context discipline. Dumping giant repos into every request was always wasteful. Quotas just make the waste easier to see.

It should also make tooling around AI usage more valuable. Expect more internal dashboards, routing layers, token budgets by team, and policy controls around which models can be used for which tasks.

And it gives a modest tailwind to local and self-hosted coding models. Not because they suddenly beat the best commercial systems. They don’t. But if commercial assistants keep adding limits around heavy usage, running smaller open models for routine work starts to look less like tinkering and more like sensible capacity planning.

Anthropic’s move makes sense from the company’s side. The old setup was probably unsustainable. It’s still a downgrade for the people who relied most heavily on Claude Code, and there’s no reason to pretend otherwise.

If you run engineering workflows on top of these tools, treat model access like infrastructure. That’s what it is now.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI model evaluation and implementation

Compare models against real workflow needs before wiring them into production systems.

Related proof

Internal docs RAG assistant

How model-backed retrieval reduced internal document search time by 62%.

Anthropic brings Claude Code to Slack, where engineering work often starts

Anthropic is bringing Claude Code into Slack as a research preview. That matters because a lot of engineering work starts in chat long before anyone opens an editor. The pitch is simple. Mention @Claude in a Slack thread, point it at a repo, and the ...

Anthropic Cowork brings Claude file editing to Desktop without the CLI

Anthropic has rolled out Cowork for Claude Desktop, a feature that lets Claude read and edit files in a folder you explicitly choose. The appeal is obvious. It gives people some of what Claude Code can do without making them touch a CLI, set up a san...

Enterprise AI teams hit the token cost wall

Enterprise AI spending has reached the boring, painful phase: finance wants receipts. Uber reportedly burned through its entire 2026 AI coding budget by April. Microsoft pulled Claude Code licenses from developers only months after giving them access...