RTK kills the token waste hiding in every AI coding session
Table of contents
- The actual cost of CLI noise
- RTK: a compression layer between your terminal and your agent
- What this looks like in practice
- Why this matters more than you’d think
- It works with basically everything
- Setup takes about 30 seconds
- What’s coming next
- The bottom line
Here’s something that bothered me for quite a while before I found a fix: every time an AI coding agent runs a shell command, the full output gets dumped into the context window. All of it. The 262-line test suite output where every single test passed. The verbose git log with commit metadata you’ll never reference again. The ls -la listing with file permissions and timestamps for 40 files.
That output isn’t free. It eats tokens. And tokens are the one resource every AI coding tool rations, whether you’re paying $20/month or $200.
The actual cost of CLI noise
I didn’t fully appreciate this until I started paying attention to what happens during a typical coding session with an agent. In roughly two hours of back-and-forth, an agent might run 60 shell commands. Each one averages about 3,500 tokens of output. That’s 210,000 tokens of CLI noise alone — enough to nearly fill a 200K context window before the agent even gets to reason about your code.
This is the thing nobody talks about when complaining that Claude Code sessions feel too short, or that Cursor burns through credits too fast. The agent isn’t being wasteful with its reasoning. It’s drowning in cargo test boilerplate.
RTK: a compression layer between your terminal and your agent
RTK (Rust Token Killer) is an open-source CLI proxy that sits between your shell commands and your AI agent’s context window. It intercepts command output and strips out the noise before the agent ever sees it.
The numbers from real usage are hard to argue with:
cargo testwith 262 passing tests: 4,823 tokens down to 11. That’s 99% compression. The agent just needs to know everything passed.git diff HEAD~1on a large change: 21,500 tokens down to 1,259. RTK keeps the meaningful hunks and drops the boilerplate.cat src/main.rson a 1,295-line file: 10,176 tokens down to 504. It extracts the structural skeleton — imports, type definitions, function signatures — and collapses the rest.
Across 2,900+ real commands measured by the project, the average compression rate sits at 89%.
What this looks like in practice
Take git status. A standard output runs about 120 tokens:
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: index.html
modified: src/main.rs
modified: src/config.rs
Untracked files:
(use "git add <file>..." to include in what will be committed)
.fastembed_cache/
tests/
no changes added to commit (use "git add" and/or "git commit -a")
RTK compresses it to ~30 tokens:
master...origin/master
Modified: 3 files
index.html
src/main.rs
src/config.rs
Untracked: 2 files
.fastembed_cache/
tests/
Same information. 75% less noise. The hint text (“use git add…”), the branch tracking boilerplate, the instructional lines — none of it helps the agent write better code. RTK throws it away.
The effect compounds. Pytest output with 33 passing tests goes from 756 tokens to 24. A grep -rn "pub fn" scan across a Rust codebase drops from 2,108 tokens to 940. An ls -la listing shrinks from 3,200 tokens to 640.
Why this matters more than you’d think
Context windows aren’t just about fitting more text. When an agent’s context is 70% test runner boilerplate and git metadata, the signal-to-noise ratio tanks. The model has to sift through thousands of irrelevant tokens to find the information it actually needs for reasoning. Less noise means better answers.
There’s also the session length problem. On Claude Code’s Pro plan (~45 messages per 5 hours), context overflow forces restarts, and you lose the conversational thread — the agent forgets what it was doing, what it already tried, what your preferences are. RTK users report sessions lasting roughly 3x longer before hitting that wall.
And if you’re on a pay-per-token setup like Aider or Gemini CLI, the savings are direct. A team of 10 developers doing CLI-heavy work can waste around $1,750/month on tokens the model doesn’t need. Cut 89% of that and the math gets interesting.
It works with basically everything
RTK covers the tools people actually use:
| Tool | Pricing | How RTK helps |
|---|---|---|
| Claude Code | $20-$200/mo | Sessions ~3x longer, quota stretches further |
| Cursor | $20-$200/mo | Credits go ~2x further |
| OpenAI Codex | $20-$200/mo | More iterations before hitting the message cap |
| Windsurf | $15-$60/mo | Credits last ~2x longer |
| Gemini CLI | Free / pay-per-token | ~70% lower token bills |
| Aider | Free + API costs | ~70% less API spend |
| GitHub Copilot | Free-$39/mo | Better context quality for premium requests |
| Cline / Roo | Free + API costs | ~70% less API cost |
The supported command list covers 30+ tools: git operations, test runners (cargo test, pytest, go test), file operations (ls, find, grep, cat/read), package managers (npm, cargo), Docker, kubectl, curl, and more.
Setup takes about 30 seconds
Install:
# One-liner
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
# Or Homebrew
brew install rtk
Activate the auto-rewrite hook:
rtk init --global
That’s it. The init command installs a PreToolUse hook that transparently rewrites Bash commands to their RTK equivalents. You don’t change how you work. The agent keeps running git status and cargo test like normal — RTK intercepts and compresses the output before it reaches the context window.
To see your savings over time:
rtk gain
One developer reported 15,720 commands processed with 138 million tokens saved at 88.9% efficiency after a few weeks of daily use.
What’s coming next
The team is building RTK Cloud — a dashboard for teams to track AI coding costs across developers and projects. Token analytics per dev, savings reports (“your team saved $4,200 this month”), rate limit monitoring, and enterprise controls like SSO and audit logs. Pricing starts at $15/dev/month, free for open source.
The bottom line
I keep coming back to a simple observation: no AI coding tool offers truly unlimited usage. Claude Code has message quotas. Cursor has credits. Aider charges per token. Even Gemini CLI’s free tier has rate limits. Every wasted token is a small tax on your productivity.
RTK doesn’t change how your agent works. It just makes sure the agent reads what matters and skips what doesn’t. Written in Rust, MIT-licensed, 450+ stars on GitHub. Developers at Apple, Google, Meta, Microsoft, AWS, and a bunch of other companies have starred the repo.
Worth trying. Worst case, you lose 30 seconds on the install. Best case, you stop wondering why your agent keeps forgetting what it was doing.
Links:
- Website: rtk-ai.app
- GitHub: github.com/rtk-ai/rtk