Claude's usage system is the part most paying subscribers wrestle with. Every plan ties you to a rolling budget, and the same prompt can cost wildly different amounts depending on which product you use, how long your conversation is, and how many files or tools are loaded into context. Understanding the rules is the difference between finishing a task and staring at a "limit reached" screen at 2pm.
Quick answer: Claude measures usage in tokens across a rolling 5-hour session window plus a weekly cap. Longer chats, file uploads, Extended Thinking, and Cowork sessions consume tokens faster than short Chat messages, so the fastest way to stretch your quota is to keep conversations short, switch to Sonnet or Haiku for simple tasks, and start fresh chats when the topic changes.
How Claude counts usage
A token is roughly a word, give or take a few characters. Every time you send a message, Claude re-reads the entire conversation from the beginning before generating a response. That means message one is cheap, but message 30 in the same chat forces the model to reprocess 29 previous exchanges plus all attached files before it even starts thinking about your new question.
This compounding behavior is why long sessions burn through quotas so quickly. One developer tracked a long Cowork session and found roughly 98.5% of tokens were spent re-reading history, with only about 1.5% going to the actual new output. Files, system prompts, tool definitions, memory files, and Extended Thinking traces all add to that re-read on every turn.
Claude plans and what each one includes
Anthropic sells four consumer tiers, with usage limits scaling from "enough to try it" up to "5x to 20x more than Pro." The Pro plan is billed at $17 per month with the annual discount ($200 billed up front) or $20 monthly. Max starts at $100 per month and scales up based on how much more usage you want relative to Pro.
| Plan | Price | Usage | Notable features |
|---|---|---|---|
| Free | $0 | Baseline | Chat on web, iOS, Android, desktop; code and data visualization; web search; file creation; Extended Thinking; remote MCP connectors |
| Pro | $17/mo annual or $20/mo | More than Free | Claude Code, Cowork, unlimited projects, Research, memory across conversations, Claude in Excel and Chrome, access to more models |
| Max (5x) | From $100/mo | 5x Pro | Higher output limits, early access features, priority at peak times, Claude in PowerPoint |
| Max (20x) | Higher tier | 20x Pro | Same as 5x Max with a much larger token budget for heavy Claude Code and Cowork use |
Usage limits reset on a rolling 5-hour session window, and a separate weekly cap sits on top of that. Burn the session cap and you wait for the window to roll off. Burn the weekly cap and you wait until the weekly reset, which can be several days.
Where you can check your usage
Anthropic shows a usage progress bar inside the account settings for Pro and Max subscribers at claude.ai/settings/usage. The page displays how much of your current session and weekly allotment has been consumed, but it does not break usage down by token counts, models, or individual sessions.
Claude Code writes detailed JSONL logs locally to ~/.claude/projects/ regardless of your plan, with per-session token counts (input, output, cache creation, cache read) and the model used. Third-party dashboards can parse those logs for a fuller picture, which is useful if you want to see exactly where your tokens are going. Cowork sessions run server-side and do not write local transcripts, so they aren't captured by local tools.
For teams, Anthropic supports OpenTelemetry exports from Claude Code. Administrators can enable telemetry by setting CLAUDE_CODE_ENABLE_TELEMETRY=1 and configuring an OTLP endpoint, then route metrics and events to a standard observability backend. Full details and the list of environment variables live in the Claude Code monitoring docs.
Why different products cost different amounts
Not every interaction with Claude draws from the same token tap at the same rate. File creation (spreadsheets, documents, presentations) uses more of your limit than plain chat messages. Cowork sessions tend to be the heaviest because they read your project folder before every task. Claude Code sessions can burn tokens quickly because the agent explores files, reads directories, and runs checks if you don't tightly scope the request.
Models matter too. Opus is the most expensive per interaction, Sonnet sits in the middle, and Haiku is the lightest. Using Opus with Extended Thinking for a task Sonnet could finish in 20 seconds is the fastest way to shred a weekly limit.
What actually burns your quota
A handful of patterns account for most unexpected usage spikes. Uploading files without converting them, piling connectors and MCP servers into every session, and letting conversations run for dozens of messages are the three big ones.
| Action | Approximate token cost |
|---|---|
| Single PDF page | 1,500–3,000 tokens |
| 1000x1000 screenshot | ~1,300 tokens |
| Tightly cropped screenshot | Under 100 tokens |
| 20-message Cowork session | ~105,000 tokens |
| 30-message Cowork session | ~232,000 tokens |
| Bloated 22,000-word "about me" file | Tens of thousands of tokens per task |
The same PDF uploaded to five different chats gets re-tokenized five times. MCP servers and plugins also load their tool definitions into the context window on every request, which is why a fresh Claude Code shell can show several percent of usage consumed before you've typed anything.
Practical ways to reduce token consumption
Most of the advice that actually moves the needle comes down to three ideas: shorten what Claude has to re-read, match the model to the task, and stop repeating work the system has already done.
Step 1: Convert files before uploading them. Open a blank doc, paste the relevant text, and save as Markdown or plain text. Crop screenshots tightly to only the region that matters. A 15-page PDF reused across four chats can balloon to 180,000+ tokens when a 2,000-token Markdown file would carry the same information.
Step 2: Plan in Chat, then build in Cowork or Code. Use Chat to sketch structure, agree on sections, and lock assumptions. Once you know exactly what you want, paste the plan into Cowork or Claude Code and ask it to build the artifact. Thinking happens in the cheaper product; the expensive product only executes.
Step 3: Edit previous messages instead of sending follow-ups. In Chat, clicking the edit button on an earlier message replaces that exchange rather than stacking a new "actually, I meant..." turn on top of the full history. Cowork doesn't allow inline edits, but it does let you restart the conversation from an earlier message, which prunes everything that came after.
Step 4: Start a new chat when the topic changes. A single thread that jumps from a LinkedIn post to a client proposal to a recipe keeps re-reading all the old content on every turn. New topic, new chat.
Step 5: Use Projects for recurring files. Uploading a brand guide, contract template, or research paper once into a Project lets all conversations in that project reference it without re-tokenizing. Paid plans also apply RAG, retrieving only the relevant chunks instead of loading the entire document into context.
Step 6: Pick the right model. Grammar checks, reformatting, brainstorming, and short answers are fine on Sonnet or Haiku. Save Opus with Extended Thinking for genuinely deep work. Switching models takes two clicks in the model selector; the option sometimes sits under "More models."
Step 7: Turn off features and connectors you aren't using for the current task. Web search, Research mode, Extended Thinking, and active connectors all add tokens to every response. A sensible default is to keep them off and enable per-task rather than per-account.
Claude Code specifics
Claude Code is the most sensitive to context bloat because it runs as an agent that can read files, search directories, and invoke tools. Every MCP server adds tool definitions to the prompt on each turn, which is why users with many MCPs installed often see 10–15% of their context consumed before they send a message.
Running /context inside Claude Code shows a breakdown of where tokens are going: system prompt, system tools, MCP tools, memory files, and messages. If MCP tools are taking a double-digit share, disabling servers you aren't using for the current task is the fastest win. If messages dominate, compacting the conversation or starting fresh will recover the most room.
Claude Code also supports extra usage and API billing as escape hatches. The /extra-usage command lets you pay beyond your subscription limit, and switching to API billing moves you off session caps entirely at per-token pricing.
When usage feels wrong
If usage suddenly looks much heavier than it did a week ago, check three things before assuming a change in limits. First, run /context in Claude Code to confirm your context window isn't being eaten by MCP servers or memory files you forgot about. Second, check whether Extended Thinking is enabled and whether you actually need it. Third, look at conversation length; a long-running chat with many file attachments will always consume more than a fresh one.
If the math still doesn't add up, Anthropic's support center at support.claude.com is the official route for account-specific questions, and status updates are posted at status.anthropic.com. Local dashboards built on Claude Code's JSONL logs can help document actual token counts if you need to compare usage across time periods.
Pricing reference for heavy users
For anyone considering the switch from subscription to API billing, Anthropic's current API pricing is a useful anchor. Subscription plans use a different cost structure (you're paying for access, not per token), but the API rates show the relative cost of each model.
| Model | Input | Output | Cache write | Cache read |
|---|---|---|---|---|
| claude-opus-4-6 | $5.00/MTok | $25.00/MTok | $6.25/MTok | $0.50/MTok |
| claude-sonnet-4-6 | $3.00/MTok | $15.00/MTok | $3.75/MTok | $0.30/MTok |
| claude-haiku-4-5 | $1.00/MTok | $5.00/MTok | $1.25/MTok | $0.10/MTok |
Cache read pricing is the reason prompt caching matters so much. Tokens served from cache cost roughly 10% of fresh input tokens, which is why repeated prompt structures and Projects with stable file sets are dramatically cheaper than constantly uploading new context.
The through-line across every Claude plan is the same: Claude charges for context, and context compounds with every turn. Keeping conversations short, matching the model to the task, and avoiding unnecessary file reloads does more for your quota than any plan upgrade. If you find yourself consistently hitting weekly caps despite those habits, that's usually the signal that your workload genuinely needs Max or API billing rather than a sign that Pro has failed you.