Artificial Intelligence March 15, 2026

How NanoClaw went from a macOS sandboxing project to Docker integration

NanoClaw moved fast, even for AI tooling. In about six weeks, Gavriel Cohen’s side project went from a roughly 500-line macOS sandboxing experiment to a Docker integration that gives it a path onto Linux, Windows, CI boxes, and enterprise clusters. T...

How NanoClaw went from a macOS sandboxing project to Docker integration

Docker gives NanoClaw a real shot at making AI agents safe enough to run

NanoClaw moved fast, even for AI tooling.

In about six weeks, Gavriel Cohen’s side project went from a roughly 500-line macOS sandboxing experiment to a Docker integration that gives it a path onto Linux, Windows, CI boxes, and enterprise clusters. That changes its prospects. A clever Mac-only safety layer gets attention. A Docker-backed isolation model has a chance of ending up in real workflows.

The timing tracks. AI agents are being pushed into codebases, internal docs, ticketing systems, chat logs, and CRM records long before most teams have a solid answer to a basic question: what happens when the agent gets too much access? Plenty of companies want the automation. Far fewer want an agent process rummaging through a developer laptop or copying sensitive data into some temp directory.

Cohen says NanoClaw grew out of a bad experience with OpenClaw while he was building an AI-native marketing agency. He’d connected scheduling and chat tools, then found a local file containing all of his WhatsApp messages, including personal conversations, scraped into plain text. He was also looking at a dependency tree he estimated at around 800,000 lines once transitive packages were counted. At that point, trusting the framework on faith starts to look reckless.

So he wrote a much smaller runner built around isolation.

Why Docker matters

NanoClaw’s first version leaned on Apple’s container tech announced at WWDC 2025. Good choice for a fast prototype. It also kept the project stuck on macOS, which is fine for demos and far less useful inside actual engineering organizations.

Docker changes that.

By integrating Docker Sandboxes, NanoClaw gets a security model developers and security teams already know how to inspect, tune, and fight over. That matters. Most security teams already have opinions on seccomp, rootless containers, read-only filesystems, bind mounts, network controls, AppArmor, SELinux, and image provenance. They don't need another proprietary safety story. They need an agent runner that fits systems they already use.

That’s the pitch here. NanoClaw wraps agent actions in short-lived isolated jobs built on standard container machinery. That’s easier to reason about than a large agent platform making broad promises about safety.

The project also caught real momentum. After an initial Hacker News bump, Andrej Karpathy posted about it on X and NanoClaw took off. It reportedly hit 22,000 GitHub stars, 4,600 forks, and more than 50 contributors in a few weeks. Stars don’t prove anything about security. They do say something about demand. A lot of developers want agent tooling that starts from the assumption that the model and its tools are untrusted.

That should have been standard practice from the start.

A narrow technical model

NanoClaw’s core idea is simple enough to hold in your head.

Instead of giving an agent broad host access, each tool invocation runs in its own sandbox with only the files, secrets, and permissions needed for that step. The sandbox starts, does the work, returns the output, and disappears.

That’s a far saner execution model than the usual agent-framework pattern of installing a pile of connectors, granting wide local and network access, and hoping prompt constraints hold.

On Docker, the controls are familiar:

  • process isolation through namespaces
  • CPU, memory, and PID limits through cgroups
  • privilege restrictions with seccomp plus AppArmor or SELinux
  • non-root execution
  • read-only root filesystems
  • narrow bind mounts
  • no network by default, or tightly controlled egress

A secure invocation can look like this:

docker run --rm \
--read-only \
--cap-drop=ALL \
--security-opt=no-new-privileges \
--pids-limit=256 \
--memory=512m --cpus=0.5 \
--network=none \
--tmpfs /tmp:rw,noexec,nosuid,nodev \
--mount type=bind,src=$PWD/tasks/123,dst=/work,ro \
--env FILE_TOKEN=$SHORT_LIVED_TOKEN \
--user 10001:10001 \
ghcr.io/nanoco/agent-tool:v0.3 process --input /work/spec.json

None of that is exotic. That’s why it works.

If the tool can only read a single mounted task directory, has no network access, can’t escalate privileges, and runs as a non-root user on a read-only filesystem, the blast radius gets a lot smaller. If a model starts poking around for ~/.ssh, browser cookies, shell history, cloud credentials, or Slack caches, there’s nothing there unless you mounted it. If it tries to exfiltrate data, --network=none shuts that down.

A lot of agent products still treat this as a secondary concern. It isn’t. Prompt guardrails help. Process isolation is what stops a bad tool call from turning into an incident report.

Small codebases still matter

Part of NanoClaw’s appeal is that Cohen kept the core tiny. Around 500 lines is small enough to audit without losing patience.

That doesn’t make the system automatically safe. You still depend on Docker, the kernel, the container image, the tool binary inside that image, and the policies around mounts and secrets. Containers are not a perfect sandbox, and people who work on hard isolation and multitenancy have been saying that for years. If you need stronger boundaries, you still end up looking at microVMs like Firecracker or something in the Kata Containers family.

Still, smaller orchestration code removes one common source of agent risk: sprawling plugin ecosystems and dependency trees nobody can realistically inspect. A thin control plane plus tightly restricted execution environments is a better fit for this problem.

It’s also easier to test. You can write policy tests for mount rules, egress rules, image digests, allowed syscalls, and secret lifetimes. Good luck doing that confidently across a giant agent stack with dozens of packages pulling in hundreds more.

Where it fits, and where it doesn't

NanoClaw looks well suited to task-level tool execution, especially in developer and data workflows where startup latency matters and container infrastructure already exists.

If images are cached, container startup is usually fast enough that the isolation overhead won't dominate the job. For plenty of code-gen, ETL, validation, doc processing, and API-calling agent steps, a few hundred milliseconds is a reasonable trade. In CI, on workstations, and in Kubernetes clusters, this is workable.

There are limits.

Containers share the host kernel. For regulated environments or hostile multi-tenant workloads, that may not be enough. Some organizations are going to want microVM-backed execution or hardware-backed isolation for agent actions that touch customer records, financial systems, or production control planes. Docker is practical. It is not the strongest boundary on offer.

Then there’s policy. A secure runner helps only if the permissions stay narrow. Teams love talking about least privilege right up until somebody mounts the whole project root, passes a long-lived API key through env vars, and opens outbound network access because it's convenient. At that point you've recreated the same mess with cleaner packaging.

So the hard part moves from inventing sandbox tech to enforcing sane defaults and stopping people from bypassing them.

Why developers and security teams will care

Docker gives NanoClaw a shot at fitting into existing workflows instead of sitting off to the side.

That has a few practical consequences.

First, adoption gets easier. Developers already know how to build an image, pin a digest, mount a directory, and inspect a container. Security teams already know how to scan images, generate SBOMs, review registry controls, and write runtime policies.

Second, it lines up with the zero-trust direction most infrastructure teams are already headed. Agent runs can use short-lived tokens, scoped mounts, audited tool invocations, and immutable images. Those patterns fit cleanly into current IAM and DevSecOps setups. They’re boring in a good way.

Third, it makes agent safety testable. You can prove a tool image has no network. You can verify that only /work/spec.json is mounted read-only. You can reject images that run as root or ship known-vulnerable packages. That’s stronger than asking a vendor to explain its AI governance layer.

The market is moving this way whether agent framework vendors like it or not. Enterprises are going to ask for default-deny tool runners, per-tool scopes, short-lived credentials, and durable audit logs. Platforms that assume the model should have broad ambient access are going to run into trouble in production.

The harder next step

NanoClaw now has a plausible infrastructure story. The next question is whether it can stay disciplined as adoption grows.

Open source projects often start clean and get messy once users ask for plugins, richer integrations, persistent state, and convenience features. That’s usually where the security posture slips. "Just mount the repo." "Just allow outbound internet for package installs." "Just reuse the same token." A lot of bad systems design hides behind the word "just."

If NanoClaw sticks to strict defaults, visible policy controls, and a small trusted core, it could become one of the more useful pieces in the agent stack. If it expands into another everything-platform, the original appeal fades quickly.

For now, the Docker integration is the first step that makes NanoClaw look like something teams could deploy, not just an interesting safety demo.

The AI industry has spent the past year treating agents as an application problem. A lot of this is a containment problem.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service
AI engineering team extension

Add engineers who can turn coding assistants and agentic dev tools into safer delivery workflows.

Related proof
Embedded AI engineering team extension

How an embedded pod helped ship a delayed automation roadmap.

Related article
Greptile's $30M Benchmark round points to a new market for AI code review

Greptile, a startup building AI-assisted code review, is reportedly raising a $30 million Series A led by Benchmark at a $180 million valuation. For a company founded in 2023, that’s fast. It also points to a specific shift in the market. AI coding c...

Related article
Harness raises $240M to automate the software delivery gap after AI code generation

AI can write code faster than most teams can safely ship it. That gap costs real money. Harness has raised $240 million in a Series E at a $5.5 billion valuation, with $200 million in primary capital led by Goldman Sachs and a planned $40 million ten...

Related article
GV leads Blacksmith's $10M Series A four months after its seed

Google Ventures has led another round in Blacksmith just four months after leading the startup’s $3.5 million seed. The new raise is a $10 million Series A, and the timing matters almost as much as the number. Investors usually move this quickly when...