Code Mode MCP Pattern

Code Mode is a pattern for using MCP) tools where, instead of exposing each tool directly to the model as a callable function, you give the model a typed API (TypeScript or Python) and ask it to write code that calls those tools. The code runs in a sandbox, and only the final result comes back to th

Canonical version: Code Mode MCP Pattern.

Code Mode is a pattern for using MCP tools where, instead of exposing each tool directly to the model as a callable function, you give the model a typed API (TypeScript or Python) and ask it to write code that calls those tools. The code runs in a sandbox, and only the final result comes back to the model. The term was popularized by Cloudflare in 2025, and the same idea now ships in libraries like FastMCP.

The motivation is simple. Models have seen enormous amounts of real code during training, and very little synthetic tool-calling. So they're much better at writing a few lines of code than at chaining a dozen special-token tool calls by hand. As Cloudflare put it: LLMs are better at writing code to call MCP than at calling MCP directly.

The problem it solves

Classic MCP wiring has two costs that grow fast:

  • Every tool definition lives in the Context Window. Connect a few servers with a few hundred tools between them, and you can burn tens of thousands of tokens before the model has even read the user's request.
  • Every intermediate result round-trips through the model. Tool A's output gets copied into the model's input just to become the argument for tool B. The model pays, in tokens and in latency, to shuffle data it never actually needed to reason about.

Keep that second point in mind. It's the one most people miss.

How it works

  1. Schema to API. On connection, the MCP server's schema gets converted into typed interface definitions with doc comments. The model reads an API, not a flat list of tools.
  2. The model writes code. It produces a small script that calls that API. It can chain several operations, filter, loop, branch, whatever the task needs, all in one shot.
  3. Sandbox execution. The code runs in an isolated sandbox. Cloudflare uses V8 isolates that spin up in a few milliseconds; FastMCP runs Python with time and memory limits (30s and 100MB by default). Network access is locked down so credentials can't leak.
  4. Only the result returns. Intermediate values stay in the sandbox. The model gets the answer, not the plumbing.

Some implementations go a step further with progressive discovery: rather than dumping every tool upfront, they expose a couple of meta-tools (typically search and execute) so the model looks up only what it needs, when it needs it.

Why it matters: context, efficiency, scalability

This is where the pattern earns its keep.

  • Context window. You stop paying the upfront tax of loading every schema. Cloudflare's headline example: for a large surface like the full Cloudflare API, Code Mode cut input tokens by 99.9%. The traditional equivalent would have needed around 1.17 million tokens, which is more than most context windows can even hold. A bloated context isn't just expensive; it actively confuses the model and degrades its answers.
  • Efficiency. Because intermediate results never flow back through the model, you cut both token spend and round-trips. A task that used to be five separate tool calls (five trips through the neural network) becomes one script and one result.
  • Scalability. With the search + execute approach, context usage stays roughly constant no matter how big the underlying API grows. You can even compose multiple MCP servers behind a single gateway (Cloudflare calls these "Code Mode Server Portals"), so adding the tenth or the hundredth server doesn't flood the agent. The thing that used to break first under growth, the context window, stops being the bottleneck.

My Opinion

I think this pattern genuinely matters, and it maps directly to a problem I keep hitting myself: the more agentic resources you add (AI Skills, MCP servers, agents), the faster they fill the context window and confuse the model. See Agentic Resource Discovery Specification (ARD) for the discovery side of the same pain. Code Mode attacks it from the MCP side.

References


About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

Found this valuable? Share it with someone who needs it.

Join 6,000+ readers. Get practical systems for knowledge & AI. Free.

Subscribe ✨

Free: Knowledge System Checklist

A clear roadmap to building your own knowledge system. Subscribe and get it straight to your inbox.

6,000+ readers. No spam. Unsubscribe anytime.