The Token-Saving Playbook for Claude Users

Most people blame Claude for strict limits. The blame is justified to an extent. Until Anthropic eases its usage limits, users are better off optimizing token usage.

All you need to do is use tokens wisely, but not everyone knows how to do that and ends up losing a lot of tokens and money as a result.

Here are some of my habits that will save you a ton of tokens.

But First — What Is a Token?

A token is the unit Claude uses to measure text. It’s not a word, and it’s not a character — it’s somewhere in between.

Roughly, 1 token ≈ 3/4 of a word. A short, common word like “the” is one token. A longer word like “optimization” might be split into two or three tokens.

Here’s a quick example. Take this sentence:

"Claude counts tokens, not messages."

That breaks down into 7 tokens: Claude | counts | tokens | , | not | messages | .

A typical paragraph of ~100 words costs around 130–140 tokens. Every token you send and every token Claude sends back counts toward your usage limit. That’s why all the tips below focus on reducing unnecessary tokens on both sides of the conversation.

Edit Your Prompt. Don’t Send a Follow-Up
Start a Fresh Chat Every 15–20 Messages
Batch Your Questions Into One Message
Upload Recurring Files to Projects
Set Up Memory & User Preferences
Turn Off Features You’re Not Actively Using
Use Haiku for Simple Tasks
Spread Your Work Across the Day
Work During Off-Peak Hours
Use RTK (Rust Token Killer) to Compress Shell Output

1. Edit Your Prompt. Don’t Send a Follow-Up

When Claude doesn’t get your thoughts right, you might feel tempted to send:

“No, I meant [your message]”
“Ugh, that’s not what I wanted [your message]”

Don’t do that!

Every subsequent message is added to the conversation history. Claude re-reads ALL of it every turn — burning tokens on context that didn’t even help.

Token cost per message = all previous messages + your new one.

Total = S × N(N+1) / 2
(S = avg tokens per exchange, N = message count)

At ~500 tokens per exchange:

Messages	Tokens Burned
5	7.5K
10	27.5K
20	105K
30	232K

Message 30 costs 31x more than message 1.

Instead: click Edit on your original message → fix it → regenerate. The old exchange gets replaced, not stacked.

2. Start a Fresh Chat Every 15–20 Messages

In the previous section, I showed how token costs grow with every message. Ideally, you should start a new chat every 15–20 messages.

Now imagine a chat with 100+ messages. At ~500 tokens per exchange, that’s over 2.5 million tokens burned — most of it just re-reading old history.

One developer tracked his usage and found that 98.5% of tokens were spent on re-reading the history. Only 1.5% went toward actually outputting the result.

When a chat gets long → ask Claude to summarize everything → copy it → new chat → paste as first message.

3. Batch Your Questions Into One Message

Many people believe that splitting questions into separate messages leads to better results. Almost always, the opposite is true.

Three separate prompts = three context loads.
One prompt with three tasks = one context load.

You save tokens twice: fewer context reloads, and you stay further from hitting your limit.

Instead of:

"Summarize this article"
"Now list the main points"
"Now suggest a headline"

Write:

"Summarize this article, list the main points, and suggest a headline."

Bonus: the answers often turn out better because Claude immediately sees the full picture.

Three questions. One prompt. Always!

4. Upload Recurring Files to Projects

If you upload the same PDF to multiple chats, Claude re-tokenizes that document every single time.

Use the Projects feature instead. Upload your file once → it gets cached. Every new conversation inside that project references it without burning tokens again.

Cached project content doesn’t eat into your usage when you access it repeatedly.

If you work with contracts, briefs, style guides, or any long docs — this alone could cut your token spend dramatically.

5. Set Up Memory & User Preferences

Every new chat without saved context wastes 3–5 messages on setup: “I’m a marketer, I write in a casual style, I prefer short paragraphs…”

You’ve probably seen people start every prompt with “Act as a…” — that’s tokens burned on repeat.

Claude can remember this permanently.

Go to Settings → Memory and User Settings. Save your role, communication style, and settings once. Claude will automatically apply them to every new chat.

6. Turn Off Features You’re Not Actively Using

Web search, connectors, and “Explore” mode — all of these add tokens to every response, even if you don’t need them.

Writing your own content? Turn off the Search and Tools feature.

The Advanced Thinking feature also consumes tokens. Keep it turned off by default. Only turn it on if your first attempt was unsatisfactory.

Rule: if you didn’t turn a feature on intentionally, turn it off.

7. Use Haiku for Simple Tasks

Grammar checking, brainstorming, formatting, quick translations, short answers — Haiku handles all of this at a much lower cost than Sonnet or Opus.

Choosing the right model is the most important decision you make every day. Haiku for drafts and simple tasks → frees up 50–70% of your budget for tasks that truly require powerful models.

Mental model:

Model	Use Case	Cost
Haiku	Quick tasks	Low
Sonnet	Real work	Medium
Opus	Deep thinking	High

You don’t need powerful models for simple tasks!

8. Spread Your Work Across the Day

The Claude system uses a rolling 5-hour window. It does not reset at midnight — your limit gradually decreases.

Messages sent at 9 a.m. will no longer count by 2 p.m.

If you use up your entire limit during a single morning session, most of your daily limit will remain unused.

Divide your day into 2–3 sessions: morning, afternoon, and evening. By the time you return, your previous usage is no longer counted, and you have a new limit.

9. Work During Off-Peak Hours

Starting March 26, 2026: Anthropic will now use up your 5-hour session limit more quickly during peak hours:

5:00 AM to 11:00 AM Pacific Time / 8:00 AM to 2:00 PM Eastern Time on weekdays.

Same query, same chat — but during peak hours, it impacts your limit more. Your weekly limit remains the same, but how it’s distributed has changed.

Running resource-intensive tasks in the evening or on weekends will significantly stretch your plan.

If you’re outside the U.S. (in Europe, Latin America, or Asia), peak hours may actually fall during your afternoon, so check the calculation based on time zones.

10. Use RTK (Rust Token Killer) to Compress Shell Output

Every time Claude Code runs a shell command — git log, cat, npm test, grep — the full, unfiltered output gets dumped into the context window. The model has to process all of it. And most of it is noise: commit metadata, GPG signatures, passing test boilerplate, decorative separators.

One developer tracked a typical 30-minute session and found 150,000 tokens burned — not on reasoning or code generation, but on reading command output.

RTK is a single Rust binary that sits between your AI agent and the shell. It intercepts command output and applies four compression strategies before the tokens hit the context window:

Smart Filtering — strips comments, blank lines, boilerplate headers, and metadata the model will never act on.
Grouping — aggregates similar items. Instead of listing 47 .tsx files individually, RTK outputs: components/ (47 .tsx files).
Truncation — preserves the head and tail of long outputs while cutting the redundant middle. 200 passing tests become a count and the 3 failures.
Deduplication — collapses repeated entries. Five identical warning lines become one line with (×5).

Real-world savings by operation:

Operation	Raw Tokens	With RTK	Savings
`git log`	~4,200	~840	80%
`git diff`	~12,000	~960	92%
`npm test` / `pytest`	~6,000	~600	90%
`grep` / `rg` (search)	~8,500	~1,700	80%
`cat` / file reads	~3,500	~1,050	70%

The same 30-minute session drops from 150K tokens to about 45K — a 70% reduction for the exact same work.

Setup takes about 60 seconds. RTK’s hook-first architecture means Claude Code doesn’t need to prefix commands with rtk — a single init command installs a transparent hook that rewrites operations automatically. No workflow changes, no friction.

The secondary benefit is context window quality. When the window isn’t bloated with raw command output, the model’s attention stays focused. More room for actual code, actual reasoning — and noticeably better results.

Conclusion

At first, it will be very difficult to follow all the rules, but once you can apply them automatically, you’ll hopefully never hit your limits or use them up a lot later.

“When working with LLMS, think in tokens.”-Rushi

Rushi's

Ctrl+AI+Ship