What are tokens? How to manage token limits with AI agents

If you have spent any time using AI tools for real work, you have probably run into a usage limit at some point. The conversation cuts off, or a message appears telling you that you have reached your limit and to try again later. It is frustrating, especially when you are in the middle of something. And if you have recently switched between platforms, you may have noticed that some feel more generous than others.

This guide covers what tokens actually are, how the limits compare across Claude, ChatGPT, and other providers, why agents eat through them so much faster than regular chat, and what you can do to stretch your usage further.

What is a token?

Before any of the rest of this makes sense, you need to understand what a token actually is.

Think of a token as a small chunk of text. It is roughly four characters, or about three quarters of a word in English. The word "hamburger" is about two tokens. The phrase "AI agents are changing how people work" is around eight tokens. A full page of text is somewhere in the ballpark of 500 to 800 tokens. If you want to see it in action, OpenAI's tokenizer playground lets you paste any text and watch it get broken into tokens in real time. It works the same way across most major AI models and is a great way to build intuition for how much different types of content actually cost.

When you send a message to Claude, ChatGPT, or any large language model, your message and the full conversation history are all converted into tokens and processed together. The model does not read just your latest message in isolation, it reads the whole thread each time it generates a response. On very long conversations, providers will sometimes compress or summarize older parts of the history to keep things manageable, but the core mechanic holds: the more conversation that has accumulated, the more tokens are in play.

This matters because a long back-and-forth becomes more token-intensive with every message, even if your individual prompts are short.

A practical mental model: imagine you are paying for fax paper. A quick note costs a little. A full report costs a lot. And if you keep re-faxing the entire conversation history every time you want to say one more thing, that paper adds up fast.

Before we continue: what are agents?

If you are new to AI agents, they are a step beyond chatbots. A chatbot answers questions. You type, it responds. An agent takes a goal and works through it: reading files, making decisions, taking actions, and delivering results with minimal back-and-forth. Think of it as the difference between asking someone a question and delegating a task. For a fuller explanation, see our guide on what AI agents are and how they differ from chatbots.

Why agents hit limits so much faster than chat

If you have been using Claude or ChatGPT for regular chat, you have probably never come close to hitting a limit in a single session. Then you try an agent for the first time and suddenly you are capped out in an afternoon. Here is why.

When you ask a chatbot a question, one message goes in, one response comes out. Maybe 500 tokens total.

When an agent like Claude Cowork or ChatGPT Agent handles a task, the process looks more like this: it reads your initial instruction, plans a sequence of steps, reads a file, processes what it found, reads another file, cross-references information, makes a decision, writes an output, checks its work, and reports back to you. Each of those steps involves model calls. Each model call uses tokens. A task that feels like "one request" from your side can involve dozens of internal operations.

If you give Cowork access to a folder containing 40 documents and ask it to synthesize a research summary, it might need to read all 40 files before it can write anything. Depending on how long those files are, you might burn through a quarter of your daily Pro allowance on that single task.

This is not a flaw. It is what makes agents genuinely useful. But it is worth understanding so you are not caught off guard.

The hard truth about all-day agent use

Here is something no one really wants to say but should: if you are running Claude Cowork or ChatGPT Agent for a full eight-hour workday on a Pro plan, you will hit your limits. Probably more than once.

Pro plans across all platforms were designed for regular, meaningful professional use. They were not designed for someone treating an AI agent as a full-time employee working nonstop alongside them. That is what Max and Pro tiers at $100 to $200 per month are for.

The good news is that if you are hitting limits that often, you are almost certainly getting more than $100 or $200 worth of value out of the tool. Think about what it would cost to hire a human assistant to do the same work. Even at $20 per hour, a single full day of productive work would run you $160. If Claude Cowork is doing that work reliably, the math makes the Max plan an obvious call (I need to state that I'm not saying replace all your employees with AI, just giving an analogy).

The people who end up frustrated are the ones expecting all-day, every-day heavy agent use from a $20 plan. Use the right plan for your actual usage pattern, and the limits largely stop being a problem.

Peak hours and off-peak hours: be informed

In addition to rolling usage windows, providers have recently introduced another layer: peak and off-peak pricing. Anthropic and others now distinguish between high-demand periods (roughly US business hours on weekdays) and quieter times, and your usage during peak hours eats through your rolling window significantly faster than the same usage during off-peak hours.

Anthropic posted about this on Reddit, explaining how limits work and acknowledging the frustration. The response was necessary because a growing number of Pro plan users have been reporting that they hit their five-hour rolling window after just a handful of messages during busy periods:

Peak hours are weekdays, 5am–11am PT / 1pm–7pm GMT and you'll move through your 5-hour session limits faster than before. Your weekly limits remain unchanged.

This is not unique to Claude. As AI usage grows, all providers are dealing with the same capacity constraints during high-demand windows. But it does change the calculus for how you plan your work. If you are on a Pro plan and doing your heaviest agent work at 10am Eastern on a Tuesday, you are burning through your allowance at the worst possible rate.

If you have flexibility in when you do your most token-intensive work (especially if on the $20/month plan), shifting heavy sessions to off-peak hours can meaningfully extend how far your plan goes.

Best practices to manage ai token limits

You do not always need to upgrade. A lot of people hit limits more often than they should because of habits that quietly eat tokens without adding value. These are the easiest wins.

Start a new conversation for new topics. This is the single most impactful thing you can do. Every message you send gets processed alongside the entire conversation history. A conversation that started as a quick question but turned into a 40-message back-and-forth has accumulated thousands of tokens of history. Starting fresh wipes that slate clean and keeps your next task lean.

Match the scope of what you share to the scope of the task. For most tasks, uploading a full document is totally fine and often necessary. But think twice before giving an agent access to an entire folder of files to make a handful of small edits, or uploading a zip archive when only one file inside it is relevant. Large spreadsheets are a common culprit. A multi-megabyte Excel file full of rows with zeros, blanks, or data unrelated to your question can burn through tokens fast. If that is the situation, trim it down first: filter to the relevant rows, remove empty columns, and strip out anything the agent does not actually need to see. The rule of thumb is to match what you give the agent to the actual difficulty and scope of the task.

Use the lighter model for lighter tasks. Checking a document for typos, answering a quick factual question, reformatting a list: these do not need the heavy models. The lighter models handle these just fine, and they go easier on your usage quota. More on this with examples below.

Output tokens are far more expensive than input tokens. When you send a message, you are spending input tokens. When the AI responds, those are output tokens, and according to the published API pricing for both Claude and ChatGPT, output tokens cost roughly 3 to 5 times more than input tokens. Claude 3.5 Sonnet, for example, charges $3 per million input tokens and $15 per million output tokens. GPT-4o is $2.50 in versus $10 out. The ratio holds across most major models. What this means in practice is that the more you ask the AI to write, the faster you burn through your limits. Asking Claude to write you a full 2,000-word blog post from scratch is going to hit your usage much harder than uploading a 2,000-word document and asking for a one-paragraph summary. If you are regularly asking for long-form output, things like full draft reports, detailed plans, or lengthy rewrites, that is almost certainly why you are hitting limits faster than people who use the tool for shorter back-and-forth tasks.

Tips and tricks most people never try

Beyond the basics, here are a few things that make a real difference once you are used to working with agents regularly.

Front-load your instructions. When you start an agent session, put all of your context and requirements in the first message rather than trickling them in over multiple exchanges. This reduces the overall number of turns and keeps the conversation history shorter.

Check what model is being used. On any AI or agent chat, you can often see which model is active. If you are doing something simple or straightforward and the top model is selected, switching to the lighter one before starting will make your session last longer.

Summarize before continuing long sessions. If you are partway through a complex task and nearing your limit, it is sometimes better to stop, capture what has been done, and continue in a fresh session later rather than racing to finish and hitting a wall mid-task. Trying to cram the last step into a nearly-exhausted session often leads to degraded output quality anyway.

Use files instead of chat for background context. If you have a long brief, a style guide, or a set of requirements, save it as a text file in your workspace and tell the agent to read it rather than pasting all of it into the chat. This is often more efficient than repeating context across messages.

Be specific about output. If you want a bullet list, ask for a bullet list at the start. If you want a one-page summary, say one page. Vague requests often generate long responses, and then you ask for a shorter version, and now you have used twice the tokens to get to where you wanted to be.

How models within each platform affect your limits

This is something most people miss entirely. Inside Claude, ChatGPT, and Gemini, there are multiple model tiers, and they do not all consume the same number of tokens for the same work. Choosing the right one for the task at hand is one of the easiest ways to get more out of your plan.

Claude: Haiku, Sonnet, and Opus

Claude offers three model tiers: Haiku, Sonnet, and Opus. Haiku is the lightest and fastest. Sonnet is the default for most users and handles the majority of tasks well. Opus is the most capable for solving more complex problems.

ChatGPT: GPT-5 series

OpenAI follows the same tiered pattern, though their model naming moves faster. As of early 2026, ChatGPT is on the GPT-5 series. GPT-5.3 Instant is the lighter, faster model rolling out broadly. GPT-5.4 is the current flagship available to Plus, Team, and Pro users. GPT-5.4 Thinking and GPT-5.4 Pro are the heavy-duty reasoning tiers and the most token-intensive options in the lineup.

Gemini: Flash and Pro

Google's Gemini follows a similar structure, with Flash and Pro as the main tiers. Flash is fast and lightweight, and is now the default in the Gemini app. Pro delivers deeper reasoning for complex tasks and costs more against your usage quota.

Grok: standard and heavy reasoning

Grok offers a standard model for everyday tasks and a heavier reasoning variant for more complex work. The reasoning model uses significantly more tokens per request, so it is best reserved for tasks that genuinely need it.

Microsoft Copilot: GPT-based tiers

Copilot runs on Microsoft-hosted versions of OpenAI's models, with lighter and more capable options depending on your plan. The same principle applies: the more capable the model, the more it draws from your usage allowance.

Which model should you use? Example tasks

Most people default to the heaviest model available and never touch the lighter ones. You paid for the best, and you don't want a watered-down answer. But for most everyday tasks, that fear is not rational. The lighter models are not dumbed-down versions of the top one. They are purpose-built for speed and efficiency on tasks that don't require deep reasoning, and they genuinely do just as good a job on those tasks.

The rule of thumb that works in practice: if a task requires you to sit down and think carefully before answering it yourself, use the heavy model. If you would answer it quickly off the top of your head, the lighter model will handle it just fine.

Here is how to think about it across three tiers, regardless of which provider you are on:

Light model: quick, clear, single-step tasks

Use the lightest available model when the task has an obvious answer and doesn't require the AI to weigh competing factors or work through multiple steps. The response quality from the light model on these tasks is virtually identical to the heavy one.

Fix the spelling and grammar in this paragraph.

Reformat this list into a table with two columns: task and deadline.

Translate this email into Spanish.

Summarize this article in three bullet points.

Everyday model: most professional work

The middle tier is where most people should live for the majority of their actual workday. It handles drafting, analysis, research synthesis, and multi-step instructions without burning through your quota the way the heavy model does.

I have an email thread saved as 'supplier-negotiation.txt' on my desktop. Read through it and draft a reply that holds firm on our delivery timeline while leaving room to negotiate on price. Keep it professional and under 200 words.

Read the 12-page strategy document 'q2-plan.pdf' on my desktop and pull out the five most important decisions that need to be made in the next 30 days. For each one, note who owns it and what the deadline is.

Take these five customer reviews and write a one-paragraph product description that honestly reflects what people say they love about it.

Heavy model: genuinely hard problems

The heavy, reasoning-focused model earns its place when the task involves trade-offs, ambiguity, multi-step logic, or when getting it wrong has real consequences. These are the tasks where the lighter models do sometimes fall short, and the extra token cost is worth it.

I'm deciding between three vendors for a $200K contract. Read the proposal documents for each in my desktop folder 'vendor-proposals'. Weigh up their pricing, delivery timelines, risk factors, and alignment with the requirements in 'project-brief.pdf'. Give me a structured recommendation with your reasoning, not just a summary of the documents.

Here are six months of customer churn data in 'churn-data-h2.csv'. What are the leading indicators that a customer is about to leave? I want you to look for patterns that are not obvious, not just the ones I would think to ask about.

Read the contract draft in 'partnership-agreement-draft.pdf'. Identify every clause that creates meaningful financial or legal risk for us, explain what the risk is, and suggest alternative language for each one.

The honest summary: use the light model for anything you could ask an intern to handle in five minutes. Use the everyday model for the majority of your actual work. Save the heavy model for the moments where you genuinely need someone to think hard. That split alone will meaningfully extend how far your plan goes each month.

Plan comparison: what you actually get

Here is a straightforward breakdown of the paid plans across the three main platforms. Note that the specific message counts listed are based on third-party analysis, as Anthropic and OpenAI do not publish exact token limits publicly. Treat these as useful ballparks rather than hard guarantees.

Claude

Pro ($20/month) — Around 45 messages per five-hour rolling window, shared across Claude chat, Claude Code, and Cowork. 200K token context window. Ideal for regular professional use with occasional agent sessions.
Max ($100/month) — Roughly five times the Pro capacity. For people using Code or Cowork as part of a daily workflow, this is the sweet spot. 200K token context window.
Max ($200/month) — Twenty times the Pro capacity. For people running agents all day or teams where multiple people are using Claude heavily.

ChatGPT

Plus ($20/month) — Roughly 150 messages per three-hour rolling window on the standard model, with lower caps on the heavier reasoning tiers. 128K token context window. Includes access to Codex for agentic coding and Operator for browser automation.
Pro ($200/month) — Effectively unlimited across all models. Context window expands to 256K tokens in certain modes. For power users and professionals running agents continuously.

Gemini

AI Plus (~$9.99/month) — Entry-level paid tier with enhanced access to Gemini Pro and a modest increase in daily usage limits over the free plan.
AI Pro (~$19.99/month) — Around 100 standard prompts per day, with separate caps for thinking-mode and deep research tasks. 1M token context window. Includes access to Google's Gemini Agent (US only).
AI Ultra (~$249.99/month) — The highest tier, with the maximum usage limits across all features including Deep Think mode, Gemini Agent, and video generation. Designed for power users and creative professionals who need the full range of Google's AI capabilities.

Grok

SuperGrok ($30/month) and SuperGrok Heavy ($50/month) — xAI's two paid tiers at grok.com, stepping up limits and reasoning depth across Grok's latest models. SuperGrok covers most professional use cases; Heavy is for users who need maximum throughput.

Microsoft Copilot

Copilot Pro ($20/month) and Microsoft 365 Copilot ($30/user/month) — Microsoft's individual and business tiers respectively, both offering priority model access and deeper Microsoft 365 integration; the business plan adds enterprise features and admin controls.

ChatGPT wins on raw volume: If you normalize everything to token per hour at the $20 price point, ChatGPT Plus comes out ahead at roughly 50 exchanges per hour versus Claude Pro's ~9 and Gemini AI Pro's ~4 (although Gemini measures for the whole day, so it's still not the best comparison). None of these providers publish exact token counts for consumer plans, so a perfect comparison is not possible, but on pure chat volume, ChatGPT Plus gives you the most for your money at this tier.

That said, most users on Max tier plans rarely hit limits at all, from all providers mentioned. If you have found a provider or an agent you genuinely like and you are bumping into walls regularly, that is a pretty good signal to upgrade. Think about it this way: if an AI agent is saving you two hours of work a day, you are getting thousands of dollars in value every month. The cost of moving from a $20 plan to a $100 or $200 plan is almost always trivial compared to what you are getting back.

Wrapping up

Token limits are not a gotcha. They are a natural result of how much computation goes into these tools, especially agents that are actively doing work on your behalf. The more you understand the mechanics, the less often you will be surprised by them.

The short version: tokens are chunks of text, limits reset on a rolling window not at midnight, agents use far more tokens than plain chat, the right plan depends on how heavily you use these tools, and a handful of simple habits can meaningfully stretch how far your plan goes.

If you are new to agents and want to understand more about how they work before diving into usage strategy, start with our intro to AI agents guides. And if you are ready to set up your first agent, our complete setup guide for Claude Cowork walks you through the whole process.