Code Mode MCP: The Fix for AI Agents Drowning in Their Own Tools

Why the biggest problem with MCP isn’t the protocol — it’s the context window tax. And how Code Mode solves it.

What Is MCP? A Quick Refresher
The MCP Problem: Death by a Thousand Tool Definitions
- The Numbers Tell the Story
- The Three-Headed Problem
Enter Code Mode: Let the AI Write Code, Not Click Buttons
A Real-World Example: Protecting a Website from DDoS Attacks
Why Code Mode Works So Well
When Should You Use Code Mode?
The Bigger Picture

What Is MCP? A Quick Refresher

Model Context Protocol (MCP) is an open standard that lets AI agents connect to external tools and services. Think of it as a universal adapter: instead of building custom integrations for every app, developers expose their tools as MCP “servers” and any AI agent can plug right in.

An MCP server for GitHub, for example, might expose tools like create_issue, get_pull_request, list_comments, and merge_pull_request. Each tool is essentially a remote function — call it with parameters, get a response.

MCP has been a big deal. It means your AI assistant can read your email, check your calendar, query your database, and update your project board — all through a single, standardized protocol.

So what’s the problem?

The MCP Problem: Death by a Thousand Tool Definitions

Here’s the dirty secret of MCP that nobody talks about at first: every tool you connect eats your AI’s brain.

Let me explain.

When an AI agent connects to MCP servers, it needs to know what tools are available. That means loading every tool’s name, description, parameter schema, and return type into its context window — the fixed-size working memory the model uses to think.

With 5 tools, this is fine. With 50, it’s getting heavy. With 200+, it’s a disaster.

The Numbers Tell the Story

Let’s say you connect three common MCP servers to your agent:

Microsoft Teams — 10 tools
Google Drive — 10 tools
GitHub — 15 tools

That’s 35 tool definitions. Each one might be 500–1,500 tokens of JSON schema. Before your agent has even started thinking about your question, it may have already consumed 20,000–50,000 tokens just loading tool definitions.

Real-world examples are even more dramatic. One developer reported that their MCP tools were consuming over 66,000 tokens of context before a single conversation began — a third of Claude’s 200k context window, gone. Another project with 59 tools was eating roughly 45,000–50,000 tokens just from tool definitions alone.

Cloudflare’s API has over 2,500 endpoints. If you exposed each as a traditional MCP tool, you’d consume 1.17 million tokens — more than the entire context window of the most advanced models available today.

The Three-Headed Problem

This “context bloat” causes three cascading failures:

1. Higher costs, slower responses. More input tokens means more money per API call and longer wait times. A single response can cost 2–3x more when tool descriptions dominate the prompt.

2. Worse reasoning quality. Models don’t get smarter with more context — they get confused. When hundreds of tool definitions crowd the prompt, the model has to spend mental effort deciding what not to use. Similar tool names like get_status, fetch_status, and query_status cause misfires. Accuracy drops. Hallucinated tool calls increase.

3. The “expensive copy-paste” problem. Here’s the most absurd failure mode. Imagine your agent needs to download a meeting transcript (8,000 tokens) from one service and upload it to another. In standard MCP, the agent must:

Call Tool A → receive 8,000 tokens of transcript
Those 8,000 tokens pass through the model’s context window
The model copies that data into the parameters for Tool B
Call Tool B

You’re paying for an expensive AI model to act as a glorified copy-paste machine. The data doesn’t need to go through the model at all — it just needs to go from point A to point B.

Enter Code Mode: Let the AI Write Code, Not Click Buttons

Code Mode flips the entire approach on its head. Instead of presenting the AI with a massive menu of individual tools to call one at a time, you give it one tool: write and execute code.

The core insight, pioneered by Cloudflare and independently validated by Anthropic, is simple:

LLMs are better at writing code to call tools than at calling tools directly.

This makes intuitive sense. LLMs have been trained on billions of lines of code. Writing a function call in TypeScript is something they’ve seen millions of examples of. The standard MCP tool-calling format? Far fewer training examples.

How It Works — Step by Step

Here’s Code Mode in plain English:

Traditional MCP: “Here are 200 tools. Pick the right one. Fill in the parameters. Call it. Wait. Read the result. Now pick the next tool. Repeat.”

Code Mode: “Here’s a TypeScript API representing your tools. Write a script that does what you need. We’ll run it in a sandbox.”

Let’s walk through a concrete example.

Traditional MCP: Reviewing a Pull Request

Suppose you ask your agent: “Summarize PR #1234 on the vscode repo, including comments and review status.”

With traditional MCP, the agent would need to:

Step 1: Call github.get_pull_request({owner: "microsoft", repo: "vscode", pull_number: 1234})
        → Wait for response → Read 2,000 tokens of result through context

Step 2: Call github.get_pull_request_comments({owner: "microsoft", repo: "vscode", pull_number: 1234})
        → Wait for response → Read 3,000 tokens of result through context

Step 3: Call github.get_pull_request_reviews({owner: "microsoft", repo: "vscode", pull_number: 1234})
        → Wait for response → Read 1,500 tokens of result through context

Step 4: Now synthesize all that data (6,500 tokens consumed)

That’s four round trips through the model, with all intermediate data bloating the context window.

Code Mode: Same Task

With Code Mode, the agent writes a single script:

const pr = await github.get_pull_request({
  owner: "microsoft", repo: "vscode", pull_number: 1234
});

const comments = await github.get_pull_request_comments({
  owner: "microsoft", repo: "vscode", pull_number: 1234
});

const reviews = await github.get_pull_request_reviews({
  owner: "microsoft", repo: "vscode", pull_number: 1234
});

console.log({
  title: pr.title,
  status: pr.state,
  commentCount: comments.length,
  reviewSummary: reviews.map(r => ({
    reviewer: r.user.login,
    state: r.state
  }))
});

One tool call. One round trip. All four API calls happen inside the sandbox. Only the final, filtered summary comes back to the model — not the raw data from every intermediate step.

The Two-Tool Architecture

Cloudflare’s implementation of Code Mode for their entire API is dead simple. The MCP server exposes exactly two tools:

search() — The agent writes JavaScript to explore the API schema and discover what endpoints exist.
execute() — The agent writes JavaScript to actually call the API, chain operations, handle pagination, and filter results.

That’s it. Two tools. Around 1,000 tokens total. And it provides access to the entire Cloudflare API — all 2,500+ endpoints.

Compare that to the 1.17 million tokens a traditional MCP approach would require. That’s a 99.9% reduction in context usage.

What Happens Under the Hood

When you use Code Mode, here’s the flow:

Schema conversion. The MCP server’s tool definitions are converted into a clean TypeScript API with types and doc comments. This TypeScript definition is loaded into the agent’s context — but it’s far more compact than individual tool JSON schemas.
Code generation. When the agent needs to act, it writes TypeScript code against the API, instead of filling in tool-call JSON.
Sandboxed execution. The code runs inside a secure, isolated environment (like a Cloudflare Workers isolate or a V8 sandbox). No file system access. No environment variables to leak. No unrestricted internet access. The sandbox can only communicate with the approved MCP tools.
Filtered results. The code returns results via console.log(). The agent gets back only what it asked for — not the entire raw API response.

A Real-World Example: Protecting a Website from DDoS Attacks

Let’s trace through a realistic scenario to see how this works end to end.

A user asks their agent: “Protect my website from DDoS attacks.”

Step 1: Discovery (the `search` tool)

The agent writes code to search the Cloudflare OpenAPI spec for relevant endpoints:

async () => {
  const results = [];
  for (const [path, methods] of Object.entries(spec.paths)) {
    if (path.includes('/zones/') &&
        (path.includes('firewall/waf') || path.includes('rulesets'))) {
      for (const [method, op] of Object.entries(methods)) {
        results.push({ method: method.toUpperCase(), path, summary: op.summary });
      }
    }
  }
  return results;
}

From 2,500+ endpoints, the agent narrows down to ~10 relevant WAF and ruleset endpoints. The full API spec never enters the model’s context.

Step 2: Schema Inspection

The agent digs deeper into a specific endpoint to understand what parameters it expects:

async () => {
  const op = spec.paths['/zones/{zone_id}/rulesets']?.get;
  const items = op?.responses?.['200']?.content?.['application/json']?.schema;
  const props = items?.allOf?.[1]?.properties?.result?.items?.allOf?.[1]?.properties;
  return { phases: props?.phase?.enum };
}

This returns the available security phases (ddos_l7, http_request_firewall_managed, etc.) — still without dumping the schema into the context window.

Step 3: Execution

Now the agent writes code to check existing rulesets and enable protection:

async () => {
  const response = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets`
  });
  return response.result.map(rs => ({
    name: rs.name, phase: rs.phase, ruleCount: rs.rules?.length
  }));
}

Each step is a compact, targeted piece of code. The model reasons about what to do, writes a small program, and the sandbox handles the heavy lifting.

Why Code Mode Works So Well

1. Massive Token Savings

The math is straightforward. Instead of N tool definitions × M tokens each, you load a compact TypeScript API once. Cloudflare’s numbers show a reduction from 1.17 million tokens down to about 1,000 — a 99.9% saving.

Even in more modest setups, the goose project reported that enabling Code Mode dropped tool definition overhead to roughly 3% of the context window.

2. Intermediate Data Stays Out of the Context

This is the sleeper benefit. In traditional MCP, every tool result flows back through the model. A 10,000-row database query? That’s 10,000 rows in your context window.

With Code Mode, the agent writes code that filters the data in the sandbox:

const rows = await database.query("SELECT * FROM orders WHERE date > '2025-01-01'");
const summary = {
  totalOrders: rows.length,
  totalRevenue: rows.reduce((sum, r) => sum + r.amount, 0),
  topProduct: rows.sort((a, b) => b.amount - a.amount)[0]?.product
};
console.log(summary);

The model sees five summary values, not 10,000 rows.

3. Multi-Step Operations in a Single Round Trip

Need to check a condition, branch on the result, and loop? In traditional MCP, each step is a separate tool call with a separate round trip through the model. With Code Mode, it’s just… code:

let found = false;
while (!found) {
  const messages = await slack.getChannelHistory({ channel: 'C123456' });
  found = messages.some(m => m.text.includes('deployment complete'));
  if (!found) await new Promise(r => setTimeout(r, 5000));
}
console.log('Deployment notification received');

Loops, conditionals, error handling — all executed in one shot, using familiar programming patterns that every developer (and every LLM) already knows.

4. Security Through Sandboxing

The generated code runs in a locked-down isolate. No file system. No environment variables. No arbitrary internet access. The only thing the code can do is call the approved MCP tools through their typed APIs. This is actually more secure than giving the model direct tool access, because the execution boundary is explicit and auditable.

When Should You Use Code Mode?

Code Mode isn’t a silver bullet for every situation. Here’s a practical guide:

Code Mode shines when:

You have many tools (10+) across multiple MCP servers
Tasks involve chaining multiple tool calls together
Intermediate results are large (documents, datasets, logs)
You need loops, conditionals, or data transformation between calls

Traditional MCP is fine when:

You only have 3–4 tools
Tasks are simple, single-step operations
There’s no data passing between tools
The overhead of code generation would be more than calling the tool directly

The Bigger Picture

Code Mode doesn’t replace MCP — it’s built on top of MCP. Think of the relationship like HTTP and REST: MCP is the underlying protocol that makes tool communication possible, while Code Mode is an architectural pattern for using that protocol more efficiently.

The tools discovered and executed through Code Mode are still MCP tools. The schemas are still MCP schemas. Code Mode just changes how the model interacts with them — from “pick a tool from a menu” to “write a program against a typed API.”

This distinction matters because it means you can adopt Code Mode incrementally. Keep your existing MCP servers. Keep your existing tool definitions. Code Mode simply wraps them in a more efficient interaction layer.

As AI agents grow more capable and connect to more services, the context window tax of traditional tool calling will only get worse. Code Mode is one of the most promising patterns for keeping agents fast, cheap, and accurate as their toolkits scale. This goes beyond optimization — it’s a fundamental rethink of how AI agents should interact with the world.

Code Mode was introduced by Cloudflare and is available as an open-source SDK in the Cloudflare Agents SDK. Anthropic has independently explored the same pattern in their “Code Execution with MCP” research. Implementations are also available in goose (by Block), the UTCP code-mode library, and Palma.ai for enterprise use cases.

“Context is king ;)”-Rushi

Rushi's

Ctrl+AI+Ship

Code Mode MCP: The Fix for AI Agents Drowning in Their Own Tools

Table of Contents

What Is MCP? A Quick Refresher

The MCP Problem: Death by a Thousand Tool Definitions

The Numbers Tell the Story

The Three-Headed Problem

Enter Code Mode: Let the AI Write Code, Not Click Buttons

How It Works — Step by Step

Traditional MCP: Reviewing a Pull Request

Code Mode: Same Task

The Two-Tool Architecture

What Happens Under the Hood

A Real-World Example: Protecting a Website from DDoS Attacks

Step 1: Discovery (the `search` tool)

Step 2: Schema Inspection

Step 3: Execution

Why Code Mode Works So Well

1. Massive Token Savings

2. Intermediate Data Stays Out of the Context

3. Multi-Step Operations in a Single Round Trip

4. Security Through Sandboxing

When Should You Use Code Mode?

The Bigger Picture

Leave a Reply

Rushi's

Ctrl+AI+Ship

Table of Contents

What Is MCP? A Quick Refresher

The MCP Problem: Death by a Thousand Tool Definitions

The Numbers Tell the Story

The Three-Headed Problem

Enter Code Mode: Let the AI Write Code, Not Click Buttons

How It Works — Step by Step

Traditional MCP: Reviewing a Pull Request

Code Mode: Same Task

The Two-Tool Architecture

What Happens Under the Hood

A Real-World Example: Protecting a Website from DDoS Attacks

Step 1: Discovery (the search tool)

Step 2: Schema Inspection

Step 3: Execution

Why Code Mode Works So Well

1. Massive Token Savings

2. Intermediate Data Stays Out of the Context

3. Multi-Step Operations in a Single Round Trip

4. Security Through Sandboxing

When Should You Use Code Mode?

The Bigger Picture

Leave a Reply

Step 1: Discovery (the `search` tool)