Code Mode MCP: The Fix for AI Agents Drowning in Their Own Tools
Why the biggest problem with MCP isn’t the protocol — it’s the context window tax. And how Code Mode solves it.
Table of Contents
- What Is MCP? A Quick Refresher
- The MCP Problem: Death by a Thousand Tool Definitions
- Enter Code Mode: Let the AI Write Code, Not Click Buttons
- A Real-World Example: Protecting a Website from DDoS Attacks
- Why Code Mode Works So Well
- When Should You Use Code Mode?
- The Bigger Picture
What Is MCP? A Quick Refresher
Model Context Protocol (MCP) is an open standard that lets AI agents connect to external tools and services. Think of it as a universal adapter: instead of building custom integrations for every app, developers expose their tools as MCP “servers” and any AI agent can plug right in.
An MCP server for GitHub, for example, might expose tools like create_issue, get_pull_request, list_comments, and merge_pull_request. Each tool is essentially a remote function — call it with parameters, get a response.
MCP has been a big deal. It means your AI assistant can read your email, check your calendar, query your database, and update your project board — all through a single, standardized protocol.
So what’s the problem?
The MCP Problem: Death by a Thousand Tool Definitions
Here’s the dirty secret of MCP that nobody talks about at first: every tool you connect eats your AI’s brain.
Let me explain.
When an AI agent connects to MCP servers, it needs to know what tools are available. That means loading every tool’s name, description, parameter schema, and return type into its context window — the fixed-size working memory the model uses to think.
With 5 tools, this is fine. With 50, it’s getting heavy. With 200+, it’s a disaster.
The Numbers Tell the Story
Let’s say you connect three common MCP servers to your agent:
- Microsoft Teams — 10 tools
- Google Drive — 10 tools
- GitHub — 15 tools
That’s 35 tool definitions. Each one might be 500–1,500 tokens of JSON schema. Before your agent has even started thinking about your question, it may have already consumed 20,000–50,000 tokens just loading tool definitions.
Real-world examples are even more dramatic. One developer reported that their MCP tools were consuming over 66,000 tokens of context before a single conversation began — a third of Claude’s 200k context window, gone. Another project with 59 tools was eating roughly 45,000–50,000 tokens just from tool definitions alone.
Cloudflare’s API has over 2,500 endpoints. If you exposed each as a traditional MCP tool, you’d consume 1.17 million tokens — more than the entire context window of the most advanced models available today.
The Three-Headed Problem
This “context bloat” causes three cascading failures:
1. Higher costs, slower responses. More input tokens means more money per API call and longer wait times. A single response can cost 2–3x more when tool descriptions dominate the prompt.
2. Worse reasoning quality. Models don’t get smarter with more context — they get confused. When hundreds of tool definitions crowd the prompt, the model has to spend mental effort deciding what not to use. Similar tool names like get_status, fetch_status, and query_status cause misfires. Accuracy drops. Hallucinated tool calls increase.
3. The “expensive copy-paste” problem. Here’s the most absurd failure mode. Imagine your agent needs to download a meeting transcript (8,000 tokens) from one service and upload it to another. In standard MCP, the agent must:
- Call Tool A → receive 8,000 tokens of transcript
- Those 8,000 tokens pass through the model’s context window
- The model copies that data into the parameters for Tool B
- Call Tool B
You’re paying for an expensive AI model to act as a glorified copy-paste machine. The data doesn’t need to go through the model at all — it just needs to go from point A to point B.
Enter Code Mode: Let the AI Write Code, Not Click Buttons
Code Mode flips the entire approach on its head. Instead of presenting the AI with a massive menu of individual tools to call one at a time, you give it one tool: write and execute code.
The core insight, pioneered by Cloudflare and independently validated by Anthropic, is simple:
LLMs are better at writing code to call tools than at calling tools directly.
This makes intuitive sense. LLMs have been trained on billions of lines of code. Writing a function call in TypeScript is something they’ve seen millions of examples of. The standard MCP tool-calling format? Far fewer training examples.
How It Works — Step by Step
Here’s Code Mode in plain English:
Traditional MCP: “Here are 200 tools. Pick the right one. Fill in the parameters. Call it. Wait. Read the result. Now pick the next tool. Repeat.”
Code Mode: “Here’s a TypeScript API representing your tools. Write a script that does what you need. We’ll run it in a sandbox.”
Let’s walk through a concrete example.
Traditional MCP: Reviewing a Pull Request
Suppose you ask your agent: “Summarize PR #1234 on the vscode repo, including comments and review status.”
With traditional MCP, the agent would need to:
Step 1: Call github.get_pull_request({owner: "microsoft", repo: "vscode", pull_number: 1234})
→ Wait for response → Read 2,000 tokens of result through context
Step 2: Call github.get_pull_request_comments({owner: "microsoft", repo: "vscode", pull_number: 1234})
→ Wait for response → Read 3,000 tokens of result through context
Step 3: Call github.get_pull_request_reviews({owner: "microsoft", repo: "vscode", pull_number: 1234})
→ Wait for response → Read 1,500 tokens of result through context
Step 4: Now synthesize all that data (6,500 tokens consumed)
That’s four round trips through the model, with all intermediate data bloating the context window.
Code Mode: Same Task
With Code Mode, the agent writes a single script:
const pr = await github.get_pull_request({
owner: "microsoft", repo: "vscode", pull_number: 1234
});
const comments = await github.get_pull_request_comments({
owner: "microsoft", repo: "vscode", pull_number: 1234
});
const reviews = await github.get_pull_request_reviews({
owner: "microsoft", repo: "vscode", pull_number: 1234
});
console.log({
title: pr.title,
status: pr.state,
commentCount: comments.length,
reviewSummary: reviews.map(r => ({
reviewer: r.user.login,
state: r.state
}))
});
One tool call. One round trip. All four API calls happen inside the sandbox. Only the final, filtered summary comes back to the model — not the raw data from every intermediate step.
The Two-Tool Architecture
Cloudflare’s implementation of Code Mode for their entire API is dead simple. The MCP server exposes exactly two tools:
search()— The agent writes JavaScript to explore the API schema and discover what endpoints exist.execute()— The agent writes JavaScript to actually call the API, chain operations, handle pagination, and filter results.
That’s it. Two tools. Around 1,000 tokens total. And it provides access to the entire Cloudflare API — all 2,500+ endpoints.
Compare that to the 1.17 million tokens a traditional MCP approach would require. That’s a 99.9% reduction in context usage.
What Happens Under the Hood
When you use Code Mode, here’s the flow:
- Schema conversion. The MCP server’s tool definitions are converted into a clean TypeScript API with types and doc comments. This TypeScript definition is loaded into the agent’s context — but it’s far more compact than individual tool JSON schemas.
- Code generation. When the agent needs to act, it writes TypeScript code against the API, instead of filling in tool-call JSON.
- Sandboxed execution. The code runs inside a secure, isolated environment (like a Cloudflare Workers isolate or a V8 sandbox). No file system access. No environment variables to leak. No unrestricted internet access. The sandbox can only communicate with the approved MCP tools.
- Filtered results. The code returns results via
console.log(). The agent gets back only what it asked for — not the entire raw API response.
A Real-World Example: Protecting a Website from DDoS Attacks
Let’s trace through a realistic scenario to see how this works end to end.
A user asks their agent: “Protect my website from DDoS attacks.”
Step 1: Discovery (the search tool)
The agent writes code to search the Cloudflare OpenAPI spec for relevant endpoints:
async () => {
const results = [];
for (const [path, methods] of Object.entries(spec.paths)) {
if (path.includes('/zones/') &&
(path.includes('firewall/waf') || path.includes('rulesets'))) {
for (const [method, op] of Object.entries(methods)) {
results.push({ method: method.toUpperCase(), path, summary: op.summary });
}
}
}
return results;
}
From 2,500+ endpoints, the agent narrows down to ~10 relevant WAF and ruleset endpoints. The full API spec never enters the model’s context.
Step 2: Schema Inspection
The agent digs deeper into a specific endpoint to understand what parameters it expects:
async () => {
const op = spec.paths['/zones/{zone_id}/rulesets']?.get;
const items = op?.responses?.['200']?.content?.['application/json']?.schema;
const props = items?.allOf?.[1]?.properties?.result?.items?.allOf?.[1]?.properties;
return { phases: props?.phase?.enum };
}
This returns the available security phases (ddos_l7, http_request_firewall_managed, etc.) — still without dumping the schema into the context window.
Step 3: Execution
Now the agent writes code to check existing rulesets and enable protection:
async () => {
const response = await cloudflare.request({
method: "GET",
path: `/zones/${zoneId}/rulesets`
});
return response.result.map(rs => ({
name: rs.name, phase: rs.phase, ruleCount: rs.rules?.length
}));
}
Each step is a compact, targeted piece of code. The model reasons about what to do, writes a small program, and the sandbox handles the heavy lifting.
Why Code Mode Works So Well
1. Massive Token Savings
The math is straightforward. Instead of N tool definitions × M tokens each, you load a compact TypeScript API once. Cloudflare’s numbers show a reduction from 1.17 million tokens down to about 1,000 — a 99.9% saving.
Even in more modest setups, the goose project reported that enabling Code Mode dropped tool definition overhead to roughly 3% of the context window.
2. Intermediate Data Stays Out of the Context
This is the sleeper benefit. In traditional MCP, every tool result flows back through the model. A 10,000-row database query? That’s 10,000 rows in your context window.
With Code Mode, the agent writes code that filters the data in the sandbox:
const rows = await database.query("SELECT * FROM orders WHERE date > '2025-01-01'");
const summary = {
totalOrders: rows.length,
totalRevenue: rows.reduce((sum, r) => sum + r.amount, 0),
topProduct: rows.sort((a, b) => b.amount - a.amount)[0]?.product
};
console.log(summary);
The model sees five summary values, not 10,000 rows.
3. Multi-Step Operations in a Single Round Trip
Need to check a condition, branch on the result, and loop? In traditional MCP, each step is a separate tool call with a separate round trip through the model. With Code Mode, it’s just… code:
let found = false;
while (!found) {
const messages = await slack.getChannelHistory({ channel: 'C123456' });
found = messages.some(m => m.text.includes('deployment complete'));
if (!found) await new Promise(r => setTimeout(r, 5000));
}
console.log('Deployment notification received');
Loops, conditionals, error handling — all executed in one shot, using familiar programming patterns that every developer (and every LLM) already knows.
4. Security Through Sandboxing
The generated code runs in a locked-down isolate. No file system. No environment variables. No arbitrary internet access. The only thing the code can do is call the approved MCP tools through their typed APIs. This is actually more secure than giving the model direct tool access, because the execution boundary is explicit and auditable.
When Should You Use Code Mode?
Code Mode isn’t a silver bullet for every situation. Here’s a practical guide:
Code Mode shines when:
- You have many tools (10+) across multiple MCP servers
- Tasks involve chaining multiple tool calls together
- Intermediate results are large (documents, datasets, logs)
- You need loops, conditionals, or data transformation between calls
Traditional MCP is fine when:
- You only have 3–4 tools
- Tasks are simple, single-step operations
- There’s no data passing between tools
- The overhead of code generation would be more than calling the tool directly
The Bigger Picture
Code Mode doesn’t replace MCP — it’s built on top of MCP. Think of the relationship like HTTP and REST: MCP is the underlying protocol that makes tool communication possible, while Code Mode is an architectural pattern for using that protocol more efficiently.
The tools discovered and executed through Code Mode are still MCP tools. The schemas are still MCP schemas. Code Mode just changes how the model interacts with them — from “pick a tool from a menu” to “write a program against a typed API.”
This distinction matters because it means you can adopt Code Mode incrementally. Keep your existing MCP servers. Keep your existing tool definitions. Code Mode simply wraps them in a more efficient interaction layer.
As AI agents grow more capable and connect to more services, the context window tax of traditional tool calling will only get worse. Code Mode is one of the most promising patterns for keeping agents fast, cheap, and accurate as their toolkits scale. This goes beyond optimization — it’s a fundamental rethink of how AI agents should interact with the world.
Code Mode was introduced by Cloudflare and is available as an open-source SDK in the Cloudflare Agents SDK. Anthropic has independently explored the same pattern in their “Code Execution with MCP” research. Implementations are also available in goose (by Block), the UTCP code-mode library, and Palma.ai for enterprise use cases.