Stop Running AI Agents Naked: A Developer’s Guide to Sandboxing
You’re deep in a refactor, you hand off a task to your AI coding agent, and you walk away to get coffee. When you come back, your project directory has been restructured — but so has everything else. The agent followed a chain of reasonable-looking steps that ended somewhere you never intended.
This isn’t a hypothetical. It’s happened in documented incidents with real consequences. And most developers are still running their AI agents without any containment at all.
This post covers what sandboxing actually protects you from (and what it doesn’t), the realistic options available today, and how to pick the right one for your workflow — without spending an afternoon fighting Docker flags.
Table Of Contents
- The actual threat model
- What sandboxing actually protects (and what it doesn’t)
- Your options
- Sandboxing and network access: the hidden gap
- Comparing the options
- Practical considerations
- Where to start
The actual threat model
Before picking a containment strategy, it helps to be precise about what you’re actually containing. Three distinct failure modes, each with different causes and mitigations.
Accidental destruction
AI agents can make catastrophic mistakes without any malice or attack. They misinterpret scope, follow the letter of an instruction instead of its spirit, or chain together individually reasonable steps that compound into a disaster. A widely-reported incident involved an agent deleting an entire company database — including backups — in under ten seconds. Another had an agent wipe a user’s home directory.
These failures aren’t bugs in the agent so much as a consequence of giving a powerful tool unconstrained access to your system.
Prompt injection
Prompt injection is the more adversarial failure mode. Malicious instructions can be embedded in content the agent reads — a file, a webpage, a comment in code — and the agent may execute them as if they came from you. A documented example from 2026 involved injected instructions that caused agents to delete source code. The attack surface is anything the agent reads: documentation, READMEs, GitHub issues, web search results.
Supply chain attacks
When you ask an agent to clone a repo and get started, you’re trusting that repo’s contents. Research has documented npm packages and repositories that include instructions specifically crafted to manipulate AI agents. The agent reads a README.md that contains a hidden prompt, and suddenly it’s doing something you didn’t ask for.
All three failure modes get significantly worse when the agent has unrestricted access to your filesystem, your credentials, and your network. A sandboxed agent can still be fooled — but the damage is bounded.
What sandboxing actually protects (and what it doesn’t)
Before you invest time setting one up, be clear on what a sandbox actually gives you. Overselling it is almost as bad as skipping it.
What sandboxing limits:
- Filesystem access to your project directory (not your whole system)
- Access to credentials, SSH keys, and secrets stored elsewhere on your machine
- The ability to modify system configuration or other projects
- Lateral movement to other processes and services running locally
What it doesn’t prevent:
- Destruction within the project directory — the agent still has write access to what you gave it
- Network-based data exfiltration if you allow internet access (the agent can still make outbound requests)
- Mistakes in code the agent writes, or logic errors in the changes it makes
- Prompt injection from content within the mounted directory
A sandbox is a containment boundary, not a security guarantee. Think of it as limiting the blast radius, not eliminating the risk. You still need good backups, code review, and healthy skepticism about what the agent does.
Your options
Three tools cover most practical cases. They differ significantly in ease of use, flexibility, and what they actually isolate.
Docker Sandboxes (sbx) — easiest path
Docker Sandboxes is a newer tool that makes containerized AI agents a one-liner:
cd ~/dev/my-project
sbx run claude
It’s worth understanding what’s actually happening here, because the implementation matters. This tool uses microVMs — lightweight virtual machines — rather than traditional Docker containers. MicroVMs offer stronger isolation than Linux containers because each sandbox runs its own kernel, which means a container escape (a real attack class against Docker) is significantly harder.
Notable properties:
- You don’t need Docker installed. The
sbxbinary handles everything. - Your project directory is mounted into the sandbox; nothing else on your system is accessible.
- Authentication credentials are persisted across sandbox runs, so you don’t re-authenticate every time.
- The sandbox has an optional network policy (egress firewall), which can prevent outbound connections if you want that.
- Your project directory is not modified — no lockfiles, no config files, nothing.
- Within the sandbox, Docker itself is available, so the agent can start its own containers if needed.
It supports several agents beyond Claude: Codex, Copilot, Cursor, OpenCode, and more.
sbx is relatively new, which means it carries early-adoption risk — expect rough edges, evolving APIs, and limited community troubleshooting resources compared to Docker. MicroVMs also have slightly more startup overhead than plain containers, though in practice this is measured in seconds, not minutes. Because sbx manages the environment for you, you also have less control over what’s installed inside the sandbox compared to a custom Dockerfile.
Dev Container CLI — more control, more setup
The Dev Container CLI (devcontainer) is a more established tool backed by the VS Code and GitHub Codespaces ecosystem. It uses a devcontainer.json configuration file to define the environment.
devcontainer up --workspace-folder .
devcontainer exec --workspace-folder . claude
The payoff for the additional setup is significant configurability. You can pin exact tool versions, add language runtimes, install specific dependencies, and share the same environment definition across your team. If you already use dev containers for development environment standardization, extending this to AI agent sandboxing is a natural fit.
The downside is friction: you need a devcontainer.json in each project (or a shared template), and debugging container startup issues requires Docker knowledge. This adds a file to every repo, which may or may not align with your preferences.
Unlike sbx, Dev Container CLI uses standard Docker under the hood, so it inherits Docker’s isolation model — stronger than running naked, weaker than microVMs.
Raw Docker — maximum control, real effort
You can write a Dockerfile and a docker run invocation yourself. Here’s what a reasonably hardened version looks like:
docker run --rm -it \
--network=none \
--read-only \
--tmpfs /tmp:size=512m \
-v "$PWD":/workspace:rw \
-w /workspace \
--user 1000:1000 \
--cap-drop=ALL \
--security-opt=no-new-privileges \
your-claude-image:latest
This gives you complete control over every aspect of the environment — useful in regulated contexts or when you need to audit exactly what’s running. The cost is that credential persistence, network policy, and image management are all your problem. You’ll spend real time debugging flags, handling authentication, and maintaining images as tools update. For most individual developers, this effort doesn’t pay off relative to the alternatives above — but if your threat model requires it, this path is available.
Sandboxing and network access: the hidden gap
Most sandbox setups, including the defaults for sbx, allow the AI agent outbound network access. This is often necessary — agents need to fetch documentation, install packages, and interact with APIs. But it also means that if an agent is compromised via prompt injection, it can still exfiltrate data: POST your code to an external endpoint, send your git history somewhere, or trigger external webhooks.
If this is a concern for your context — and in a work environment handling proprietary code, it should be — you want to look at:
- Egress firewalling:
sbxhas an optional network policy that can restrict what the sandbox can reach. Raw Docker can use--network=noneor a custom bridge with firewall rules. Either way, you’ll need to explicitly allowlist what the agent legitimately needs. - Allowlisting by domain: Allow
api.anthropic.com, block everything else. This is harder to configure but significantly reduces exfiltration risk.
How tight your egress policy should be depends on what the agent is actually doing. A coding agent that only calls the LLM API can run with very restricted outbound access. One that needs to hit a live staging environment cannot.
Comparing the options
sbx | Dev Container CLI | Raw Docker | |
|---|---|---|---|
| Setup time | Minutes | Hours | Hours–Days |
| Isolation strength | High (microVM) | Medium (container) | Medium (container) |
| Flexibility | Low | High | Maximum |
| Project file added | No | Yes (devcontainer.json) | Optional |
| Docker required | No | Yes | Yes |
| Network control | Built-in policy | Manual | Manual |
| Credential persistence | Automatic | Manual | Manual |
| Maturity | New (2026) | Established | Established |
Use sbx if you want the fastest path to being safer, you’re working on personal projects, or you want to minimize configuration overhead. The isolation is genuinely stronger than containers due to microVMs.
Use Dev Container CLI if you already use dev containers, you need a reproducible environment shared across a team, or you want to precisely define what’s in the sandbox.
Use raw Docker if you’re in a context with compliance requirements, you need to audit every detail of the sandbox, or you’re building an automated pipeline where a bespoke image makes more sense than a general-purpose tool.
Practical considerations
Filesystem access is still write access. Even in a sandbox, the agent can delete, overwrite, or corrupt everything in your project directory. This isn’t an argument against sandboxing — it’s an argument for also having good version control hygiene. Commit before handing a task to an agent. If you’re not comfortable with the agent undoing hours of work, don’t hand it a three-hour task.
Sandboxing is per-session, not per-action. You give the agent access to a directory, and it has that access for the entire session. You can’t dynamically restrict what it touches mid-task. This is a genuine limitation of the current model.
Startup overhead is real but manageable. MicroVMs take a few seconds to start. For interactive sessions, this is a one-time cost. For automated pipelines running many short agent tasks, it may add up.
Credential isolation has limits. sbx persists your agent credentials to avoid re-authentication. This is convenient, but it means the credentials are accessible to the sandbox. If the sandbox is compromised, so are those credentials. Understand what “credential isolation” actually means for the tool you’re using.
Where to start
If you’re currently running an AI coding agent directly on your machine with no containment, the immediate step is to try sbx:
# Install sbx from https://www.docker.com/products/docker-sandboxes/
sbx run claude
It requires almost no setup and meaningfully reduces your risk exposure. Run it that way for a week and see what friction you hit.
If you’re on a team, the slightly higher setup cost of Dev Containers probably pays off: you get consistent environments and a shared configuration that can be reviewed and audited. The official Dev Container spec is a reasonable starting point.
Either way, the bar here is low. Moving from “no sandbox” to “any sandbox” eliminates the most catastrophic failure modes: an agent that stumbles or is manipulated can no longer touch your SSH keys, your other projects, your global config, or anything outside the directory you gave it.
The tools exist. The setup time is now measured in minutes, not hours. There’s no good reason to keep running naked.