Claude Usage Limits Explained: Plans, Tokens, and How to Stretch Your Quota

Claude’s usage system is the part most paying subscribers wrestle with. Every plan ties you to a rolling budget, and the same prompt can cost wildly different amounts depending on which product you use, how long your conversation is, and how many files or tools are loaded into context. Understanding the rules is the difference between finishing a task and staring at a “limit reached” screen at 2pm.

Quick answer: Claude measures usage in tokens across a rolling 5-hour session window plus a weekly cap. Longer chats, file uploads, Extended Thinking, and Cowork sessions consume tokens faster than short Chat messages, so the fastest way to stretch your quota is to keep conversations short, switch to Sonnet or Haiku for simple tasks, and start fresh chats when the topic changes.

How Claude counts usage

A token is roughly a word, give or take a few characters. Every time you send a message, Claude re-reads the entire conversation from the beginning before generating a response. That means message one is cheap, but message 30 in the same chat forces the model to reprocess 29 previous exchanges plus all attached files before it even starts thinking about your new question.

This compounding behavior is why long sessions burn through quotas so quickly. One developer tracked a long Cowork session and found roughly 98.5% of tokens were spent re-reading history, with only about 1.5% going to the actual new output. Files, system prompts, tool definitions, memory files, and Extended Thinking traces all add to that re-read on every turn.

Claude plans and what each one includes

Anthropic sells four consumer tiers, with usage limits scaling from “enough to try it” up to “5x to 20x more than Pro.” The Pro plan is billed at $17 per month with the annual discount ($200 billed up front) or $20 monthly. Max starts at $100 per month and scales up based on how much more usage you want relative to Pro.

Plan	Price	Usage	Notable features
Free	$0	Baseline	Chat on web, iOS, Android, desktop; code and data visualization; web search; file creation; Extended Thinking; remote MCP connectors
Pro	$17/mo annual or $20/mo	More than Free	Claude Code, Cowork, unlimited projects, Research, memory across conversations, Claude in Excel and Chrome, access to more models
Max (5x)	From $100/mo	5x Pro	Higher output limits, early access features, priority at peak times, Claude in PowerPoint
Max (20x)	Higher tier	20x Pro	Same as 5x Max with a much larger token budget for heavy Claude Code and Cowork use

Usage limits reset on a rolling 5-hour session window, and a separate weekly cap sits on top of that. Burn the session cap and you wait for the window to roll off. Burn the weekly cap and you wait until the weekly reset, which can be several days.

Where you can check your usage

Anthropic shows a usage progress bar inside the account settings for Pro and Max subscribers at claude.ai/settings/usage. The page displays how much of your current session and weekly allotment has been consumed, but it does not break usage down by token counts, models, or individual sessions.

Claude Code writes detailed JSONL logs locally to ~/.claude/projects/ regardless of your plan, with per-session token counts (input, output, cache creation, cache read) and the model used. Third-party dashboards can parse those logs for a fuller picture, which is useful if you want to see exactly where your tokens are going. Cowork sessions run server-side and do not write local transcripts, so they aren’t captured by local tools.

For teams, Anthropic supports OpenTelemetry exports from Claude Code. Administrators can enable telemetry by setting CLAUDE_CODE_ENABLE_TELEMETRY=1 and configuring an OTLP endpoint, then route metrics and events to a standard observability backend. Full details and the list of environment variables live in the Claude Code monitoring docs.

Why different products cost different amounts

Not every interaction with Claude draws from the same token tap at the same rate. File creation (spreadsheets, documents, presentations) uses more of your limit than plain chat messages. Cowork sessions tend to be the heaviest because they read your project folder before every task. Claude Code sessions can burn tokens quickly because the agent explores files, reads directories, and runs checks if you don’t tightly scope the request.

⚠️

Extended Thinking increases token consumption because reasoning traces are carried forward on each turn. Turning it off for simple tasks like grammar fixes or short rewrites can noticeably slow the drain on your quota.

Models matter too. Opus is the most expensive per interaction, Sonnet sits in the middle, and Haiku is the lightest. Using Opus with Extended Thinking for a task Sonnet could finish in 20 seconds is the fastest way to shred a weekly limit.

What actually burns your quota

A handful of patterns account for most unexpected usage spikes. Uploading files without converting them, piling connectors and MCP servers into every session, and letting conversations run for dozens of messages are the three big ones.

Action	Approximate token cost
Single PDF page	1,500–3,000 tokens
1000×1000 screenshot	~1,300 tokens
Tightly cropped screenshot	Under 100 tokens
20-message Cowork session	~105,000 tokens
30-message Cowork session	~232,000 tokens
Bloated 22,000-word “about me” file	Tens of thousands of tokens per task

The same PDF uploaded to five different chats gets re-tokenized five times. MCP servers and plugins also load their tool definitions into the context window on every request, which is why a fresh Claude Code shell can show several percent of usage consumed before you’ve typed anything.

Practical ways to reduce token consumption

Most of the advice that actually moves the needle comes down to three ideas: shorten what Claude has to re-read, match the model to the task, and stop repeating work the system has already done.

Convert files before uploading them. Open a blank doc, paste the relevant text, and save as Markdown or plain text. Crop screenshots tightly to only the region that matters. A 15-page PDF reused across four chats can balloon to 180,000+ tokens when a 2,000-token Markdown file would carry the same information.

Plan in Chat, then build in Cowork or Code. Use Chat to sketch structure, agree on sections, and lock assumptions. Once you know exactly what you want, paste the plan into Cowork or Claude Code and ask it to build the artifact. Thinking happens in the cheaper product; the expensive product only executes.

Edit previous messages instead of sending follow-ups. In Chat, clicking the edit button on an earlier message replaces that exchange rather than stacking a new “actually, I meant…” turn on top of the full history. Cowork doesn’t allow inline edits, but it does let you restart the conversation from an earlier message, which prunes everything that came after.

Start a new chat when the topic changes. A single thread that jumps from a LinkedIn post to a client proposal to a recipe keeps re-reading all the old content on every turn. New topic, new chat.

Use Projects for recurring files. Uploading a brand guide, contract template, or research paper once into a Project lets all conversations in that project reference it without re-tokenizing. Paid plans also apply RAG, retrieving only the relevant chunks instead of loading the entire document into context.

Pick the right model. Grammar checks, reformatting, brainstorming, and short answers are fine on Sonnet or Haiku. Save Opus with Extended Thinking for genuinely deep work. Switching models takes two clicks in the model selector; the option sometimes sits under “More models.”

Turn off features and connectors you aren’t using for the current task. Web search, Research mode, Extended Thinking, and active connectors all add tokens to every response. A sensible default is to keep them off and enable per-task rather than per-account.

💡

For Claude Code, a lean CLAUDE.md file at the repository root sets permanent context without repetition, and Skills load on demand for workflows you only use sometimes. Bloated CLAUDE.md files can cause Claude to start ignoring instructions, so keep the file short.

Claude Code specifics

Claude Code is the most sensitive to context bloat because it runs as an agent that can read files, search directories, and invoke tools. Every MCP server adds tool definitions to the prompt on each turn, which is why users with many MCPs installed often see 10–15% of their context consumed before they send a message.

Running /context inside Claude Code shows a breakdown of where tokens are going: system prompt, system tools, MCP tools, memory files, and messages. If MCP tools are taking a double-digit share, disabling servers you aren’t using for the current task is the fastest win. If messages dominate, compacting the conversation or starting fresh will recover the most room.

Claude Code also supports extra usage and API billing as escape hatches. The /extra-usage command lets you pay beyond your subscription limit, and switching to API billing moves you off session caps entirely at per-token pricing.

When usage feels wrong

If usage suddenly looks much heavier than it did a week ago, check three things before assuming a change in limits. First, run /context in Claude Code to confirm your context window isn’t being eaten by MCP servers or memory files you forgot about. Second, check whether Extended Thinking is enabled and whether you actually need it. Third, look at conversation length; a long-running chat with many file attachments will always consume more than a fresh one.

If the math still doesn’t add up, Anthropic’s support center at support.claude.com is the official route for account-specific questions, and status updates are posted at status.anthropic.com. Local dashboards built on Claude Code’s JSONL logs can help document actual token counts if you need to compare usage across time periods.

Pricing reference for heavy users

For anyone considering the switch from subscription to API billing, Anthropic’s current API pricing is a useful anchor. Subscription plans use a different cost structure (you’re paying for access, not per token), but the API rates show the relative cost of each model.

Model	Input	Output	Cache write	Cache read
claude-opus-4-6	$5.00/MTok	$25.00/MTok	$6.25/MTok	$0.50/MTok
claude-sonnet-4-6	$3.00/MTok	$15.00/MTok	$3.75/MTok	$0.30/MTok
claude-haiku-4-5	$1.00/MTok	$5.00/MTok	$1.25/MTok	$0.10/MTok

Cache read pricing is the reason prompt caching matters so much. Tokens served from cache cost roughly 10% of fresh input tokens, which is why repeated prompt structures and Projects with stable file sets are dramatically cheaper than constantly uploading new context.

The through-line across every Claude plan is the same: Claude charges for context, and context compounds with every turn. Keeping conversations short, matching the model to the task, and avoiding unnecessary file reloads does more for your quota than any plan upgrade. If you find yourself consistently hitting weekly caps despite those habits, that’s usually the signal that your workload genuinely needs Max or API billing rather than a sign that Pro has failed you.