What are the new weekly rate limits for Claude Code?

There’s a weekly overall token cap across all Claude Code models and a separate weekly cap for Sonnet 4 (and Opus 4 on Max), in addition to the five-hour rolling window.

How does Anthropic calculate usage against these quotas?

Usage is tracked by counting both prompt and completion tokens, which map more directly to inference cost than wall-clock hours.

What happens when a team exceeds its weekly quota?

Claude Code requests are throttled until the weekly reset, preventing further model access until quotas renew.

Generative AI August 2, 2025

Claude Code Rate Limits Now Include Weekly Quotas. What It Means for Teams

Anthropic has added two weekly rate limits to Claude Code on top of the existing five-hour rolling limit. For teams that lean on Claude Code for long coding sessions, refactors, agent loops, or CI-driven generation, that means a hard weekly ceiling n...

Anthropic puts weekly caps on Claude Code, and power users will feel it

Anthropic has added two weekly rate limits to Claude Code on top of the existing five-hour rolling limit. For teams that lean on Claude Code for long coding sessions, refactors, agent loops, or CI-driven generation, that means a hard weekly ceiling now sits over the usual short-term throttle.

Anthropic says the change is about reliability. That’s believable. Claude Code has seen enough demand to strain capacity, and Anthropic’s status history has shown enough instability to make the explanation ring true. It also reflects a basic product and pricing problem. “Unlimited” AI coding access was always going to get tested once heavy users showed up.

The new setup adds:

a weekly overall usage cap across Claude Code models
a weekly model-specific cap for Sonnet 4, and by extension Opus 4 for Max users
the existing five-hour rolling window, which still limits short-term bursts

Anthropic ties those limits to token usage, not wall-clock time. That’s the right way to count it. Tokens map to actual inference cost far better than “hours used,” especially for coding workloads where one user sends a quick patch request and another runs a repo-wide agent loop with a huge context window.

For paid tiers, Anthropic’s rough guidance is:

Pro ($20/month): about 40 to 80 hours of Sonnet 4 per week
Max ($200/month): about 240 to 480 hours of Sonnet 4, plus 24 to 40 hours of Opus 4 per week

Once you hit the cap, Claude Code gets throttled until the weekly reset.

Why Anthropic is doing this

This is mainly a compute management move.

Coding assistants produce ugly demand patterns. A regular chatbot session is noisy but usually bounded. A coding tool can turn into a constant stream of high-context requests: scan the repo, inspect diffs, generate tests, fix lint errors, retry, reason again, retry again. Pipe that through IDE integrations or agents and token usage climbs fast.

That’s especially true on stronger models. Sonnet 4 sits in the sweet spot for serious coding work, which means it gets hammered. Opus 4 costs even more to run, with heavier reasoning and pricier inference. If Anthropic lets a small group of users camp on those models all week, everyone else gets slower responses and more outages.

So Anthropic is doing what model providers do when demand outruns supply: meter access harder.

That will irritate paying users, but it’s a rational infrastructure decision. The harder question is whether the quotas are clear enough to be useful in practice.

Token limits make sense, even if nobody likes them

Developers tend to prefer time-based language because it’s easy to picture. “Forty hours a week” sounds clear enough. The system underneath still has to run on tokens.

That’s because tokens track things that matter operationally:

GPU time
memory pressure from long prompts
output length
total inference cost
scheduler load during peak usage

A user asking for ten short code completions is cheap. A user repeatedly sending large repository context, requesting long diffs, and asking for multi-step reasoning is expensive. Same product, very different load.

At the gateway layer, this kind of rate limiting is simple on paper and messy in production. Requests come in with prompt tokens, generate completion tokens, and increment counters in a distributed store keyed by user, plan, and model. A five-hour limiter can use a rolling window or leaky bucket to curb bursts. A weekly cap sits beside it as a longer-window counter.

The hard parts are the usual ones:

eventual consistency if counters are replicated
streaming responses that stop early
retries from IDE clients
background agent calls users forgot were still running
plan upgrades mid-cycle
separate accounting for Sonnet and Opus

That last bit matters. Separate caps for the expensive models are a blunt tool, but they work.

What changes for real teams

If you use Claude Code casually, you may barely notice. If it’s part of your daily engineering throughput, you probably will.

The teams with the most exposure are the ones doing one or more of these:

long interactive coding sessions in the IDE
repo-wide edits with large context windows
autonomous or semi-autonomous agent loops
CI or pre-merge automation that calls Claude Code repeatedly
heavy use of Opus for hard debugging or architecture work

The problem isn’t only hitting the limit. It’s hitting it at the wrong time. Weekly quotas create a scheduling problem. A team can burn through capacity early in the cycle and hit a wall during a release push, a migration, or an incident.

That changes how technical leads should treat AI coding tools. Claude Code is now something closer to a managed infrastructure dependency with finite throughput.

Three practical shifts follow.

1. Get actual usage telemetry

If you’re paying for Claude Code across a team, measure it.

Track token consumption by:

user
repo
task type
model
hour and day

Without that breakdown, you won’t know whether your spend is buying real productivity or disappearing into a few noisy workflows chewing through Sonnet in the background.

Set alerts before the cap. Eighty percent is a sensible threshold. Pipe it into Slack, Datadog, Grafana, or whatever your team already watches.

Finding out only after throttling is an ops failure.

2. Prompt discipline now has a cost attached

A lot of teams still treat context windows like free storage. They aren’t.

If Claude Code usage is now capped weekly, prompt sprawl turns into a budget problem. That means:

stop dumping full files when diffs will do
cache repeated repo summaries
avoid resending unchanged context every turn
break large tasks into smaller bounded requests
save heavy reasoning passes for code that actually needs them

This all sounds obvious. Most AI coding workflows are still wasteful anyway. The model can tolerate sloppy prompts. Your quota can’t.

3. Build fallback paths

If Claude Code sits anywhere in your delivery path, you need a degraded mode.

That might mean:

switching low-priority tasks to a cheaper model
routing simple completion work to a local or open model
queueing non-urgent jobs until reset
reserving Opus for the hard cases
keeping human review workflows intact when automation gets throttled

This is where a lot of AI tooling still feels immature. Teams wire a model into critical workflows, then discover the service behaves like a scarce cloud resource.

Because that’s what it is.

This pattern is spreading

Anthropic isn’t alone. The industry has been backing away from the fiction of unlimited AI access for a while.

Cursor, Replit, and other AI-heavy developer tools have run into the same problem: flat-fee plans attract a small number of very heavy users whose actual compute cost wrecks the economics. At that point vendors tighten limits, degrade service, or absorb the margin hit. Most choose limits.

The technical reason is plain enough. Frontier-model inference is still expensive, and demand spikes are still hard to smooth out. Capacity planning gets worse when coding assistants create bursty, recursive workloads. One user can trigger a small compute storm.

Expect more of this across the market:

explicit weekly caps
model-specific quotas
priority queues for higher tiers
off-peak incentives
harder metering for agentic workflows

The vendors that handle this best will be the ones that tell developers what they get, when they’re close to the wall, and how to monitor it.

Anthropic has made the first part clearer. The second still leans too much on rough “hours” estimates that hide wide variance. For engineering teams, weekly-hour estimates are useful for budgeting. Token-level reporting is what helps you run the thing.

A policy tweak that changes workflow

It would be easy to write this off as routine rate-limit housekeeping. That misses the point.

Claude Code is turning into infrastructure. Once a tool sits inside daily coding flow, quota policy starts shaping engineering behavior: which model gets used, which tasks get automated, when jobs run, how much context gets sent, and whether teams can rely on the tool during crunch time.

Anthropic’s new limits are defensible. They may improve service quality for the median user. They also close off a phase of AI coding adoption where teams could pretend top-tier model access was effectively unbounded for a flat monthly fee.

That phase was temporary. Anthropic has now made it explicit.

What to watch

The caveat is that agent-style workflows still depend on permission design, evaluation, fallback paths, and human review. A demo can look autonomous while the production version still needs tight boundaries, logging, and clear ownership when the system gets something wrong.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI model evaluation and implementation

Compare models against real workflow needs before wiring them into production systems.

Related proof

Internal docs RAG assistant

How model-backed retrieval reduced internal document search time by 62%.

How Spotify engineers use Claude Code and Honk to stop writing code by hand

Spotify says some of its best developers haven’t written code by hand since December. Normally that would read like stage-managed exec talk. The details make it harder to dismiss. The internal setup, called Honk, lets engineers ask Claude Code from S...

Why AI coding agents are starting to depend on loops

At Meta’s @Scale conference on Friday, Claude Code creator Boris Cherny was asked whether “loops” are the next AI hype cycle or something real. His answer was blunt: “Yes, they’re for real.” Cherny described a shift that should get the attention of t...

Anthropic brings Claude Code to Slack, where engineering work often starts

Anthropic is bringing Claude Code into Slack as a research preview. That matters because a lot of engineering work starts in chat long before anyone opens an editor. The pitch is simple. Mention @Claude in a Slack thread, point it at a repo, and the ...