What are sandboxed workspaces in the Agents SDK?

They are isolated execution environments that limit file access, tools, network egress, and inject secrets at runtime for each agent job.

How does the in-distribution runtime stack help in production?

It provides a consistent orchestration layer—including planners, tool routers, state stores, and tracing—to prevent drift between testing and production.

When will TypeScript support be available?

TypeScript support is planned but not yet released; stakeholders should watch OpenAI announcements for an official release date.

Llm April 16, 2026

OpenAI Agents SDK adds sandboxed workspaces for safer enterprise agents

OpenAI has updated its Agents SDK with two features enterprise teams have been asking for: sandboxed workspaces and a supported runtime stack for long-running agents. That may sound like plumbing. It is. It’s also the part that usually breaks once an...

OpenAI’s Agents SDK grows up with sandboxed workspaces and a real runtime story

OpenAI has updated its Agents SDK with two features enterprise teams have been asking for: sandboxed workspaces and a supported runtime stack for long-running agents.

That may sound like plumbing. It is. It’s also the part that usually breaks once an agent leaves the demo stage.

Models can already plan, call tools, and work with files well enough to look good in a notebook. The harder questions come right after that. Where does the code run? What can it access? How do you stop it from roaming across the network, leaking data, or trashing a shared environment after 40 steps and a few retries?

This SDK update goes straight at those problems. It’s available for Python now, with TypeScript support planned, and exposed through the API under standard pricing. OpenAI is also previewing code mode and subagents, which points to a broader push toward modular agent systems that can survive production.

Isolation is the useful part

The strongest piece of this release is also the least flashy.

Sandboxed workspaces give each agent or job a bounded execution environment with scoped file access, restricted tools, controlled network egress, and runtime-injected secrets. That fits how enterprise security teams already think about CI jobs, ephemeral containers, and service accounts.

The controls OpenAI is exposing will look familiar:

tool_allowlist to limit which tools an agent can call
fs_mounts for read-only and read-write paths
net_policies to define outbound access and deny everything else
short-lived secrets injected at runtime instead of hanging around in memory

This is where agent projects usually get painful. Tool use is the value proposition, but it’s also the biggest risk surface. Give a model broad filesystem access and unrestricted HTTP calls, and you’ve built a compliance problem with a prompt interface.

A sandbox doesn’t fix everything. It does improve the default posture, and that matters. Most teams don’t need an agent with unlimited freedom. They need one that can read a repo, write to a temp directory, call two internal APIs, and stop there.

That’s a lot easier to approve.

OpenAI is selling the control plane too

The other big change is what OpenAI calls an in-distribution runtime for advanced models. Bad name. Good idea.

OpenAI is packaging a supported orchestration layer around the model so teams can build, test, and run agents against the same control stack they’ll use in production. In practice, that means the pieces every serious agent system ends up building anyway:

a planner for breaking goals into steps
a tool router for dispatch, retries, and timeouts
a state store for scratchpads and intermediate artifacts
policy enforcement around allowed actions
tracing and events for debugging and billing

That matters because agent failures rarely come from one model call. They come from the interaction between planning, state, tool output, retries, partial failure, and weird edge cases after step 27. If development and production handle those pieces differently, you get orchestration drift. A workflow that behaves in testing starts acting strangely under load or after a timeout chain.

OpenAI is trying to remove some of that by owning more of the stack.

There’s an obvious competitive angle. For the past year, plenty of teams have stitched together LangGraph-style orchestration, internal policy checks, telemetry hooks, and some container wrapper nobody wants to maintain. It works, until it doesn’t. A vendor-supported runtime tied closely to frontier models is attractive if you want fewer moving parts and less glue code.

The trade-off is straightforward. You lose flexibility, and probably accept more vendor lock-in than you’d prefer. A lot of enterprise teams will still take that deal.

Long-running agents expose the real problems

OpenAI is framing these updates around long-horizon tasks, and that’s the right frame.

One tool call is manageable. An 80-step workflow that reads files, edits artifacts, retries flaky API calls, and pauses for approval before doing something destructive is a different class of system. Small errors pile up. Shortcuts turn into reliability problems. Variance that looks harmless in a one-shot prompt starts dragging down results over dozens of steps.

That’s why checkpointing, idempotent tools, and human approval gates matter more than another polished planning demo.

A sane setup looks something like this:

agent = AgentSDK.create(
model="frontier-X",
workspace={
"id": "proj-mlops-migration",
"fs_mounts": [
{"path": "/workspace/repo", "mode": "ro"},
{"path": "/workspace/tmp", "mode": "rw"}
],
"net_policies": {
"egress_allow": ["https://api.internal.company"],
"default": "deny"
}
},
tools={
"allow": ["repo_browser", "sql_client", "ticket_api"],
"timeouts": {"sql_client": 15}
},
policies={
"writes": {"allow_paths": ["/workspace/tmp"]},
"human_gate": [{"step": "apply_migration"}]
},
generation={"temperature": 0.2}
)

The priorities there are sensible. Keep generation conservative. Restrict writes. Deny network access by default. Put a human gate in front of anything destructive.

That’s the kind of autonomy most companies can live with.

Python first makes sense

Shipping Python first is the obvious choice, and likely the right one.

Most enterprise automation still runs through Python. Data pipelines, internal tooling, MLOps scripts, batch jobs, repo automation, support workflows, ticketing glue, even a depressing amount of infrastructure logic. If OpenAI wants this SDK used inside real companies instead of at agent hackathons, Python is where those teams already are.

TypeScript will matter once the focus shifts toward browser workflows, frontend-heavy systems, and customer-facing apps. For the first wave of deployment, Python is the easier path.

There’s also a practical reason. Long-running agents often need access to data tooling, notebooks, ETL jobs, and internal service wrappers that already exist in Python. A Python-native SDK shortens the path from prototype to something a platform team can actually operate.

Where it helps, and where it won’t

This release fits workflows with bounded scope and a lot of coordination work. For example:

triaging support tickets with internal policy checks
reconciling invoices against finance systems
scanning a codebase, preparing a patch, and opening a PR
reviewing alerts, gathering context, and drafting an incident summary
performing staged operations with approval gates

These are multi-step, tool-heavy tasks with enough structure to define safe boundaries.

It’s less convincing when tool access is wide open, task definitions are fuzzy, or the environment itself is chaotic. A sandboxed runtime can limit damage, but it won’t rescue a badly designed workflow. If your tool contracts are inconsistent, your APIs are flaky, and the business logic lives across five wiki pages, the agent will still behave like a confused intern with shell access.

That’s not OpenAI’s fault. The SDK also won’t save you from it.

Security teams will care more than prompt engineers

There’s a noticeable shift here.

For the past year, agent tooling has largely followed what model builders and app teams wanted: better planning, richer tool use, more memory, more autonomy. This update leans toward what security, compliance, and platform teams care about: isolation, policy enforcement, auditability, repeatability.

That usually signals a market getting more serious. Enterprise adoption rarely stalls because the model can’t do enough. It stalls because the controls around it are thin.

If OpenAI’s tracing format plugs cleanly into the observability stacks companies already run, and if the policy hooks are flexible enough to match internal approval flows, this gets easier to buy. OpenTelemetry alignment would help, though OpenAI hasn’t put that front and center.

It also raises the bar for the rest of the agent ecosystem. Anthropic, Google, Microsoft, and open source frameworks all have some answer for tool use and orchestration. The baseline is shifting. “Production-ready” now needs scoped execution, reliable state handling, and decent telemetry. Without those pieces, the stack looks half-finished.

The downside is vendor gravity

There’s a cost to this approach.

The more OpenAI owns the runtime, the more your agent system starts to depend on OpenAI-specific assumptions about planning, state, policy, tracing, and deployment. That may be acceptable if you’re already committed to its models and want speed. It’s less appealing if you expect to swap providers, mix orchestration backends, or keep your control plane independent.

This is a familiar platform bargain. You get tighter integration and fewer components to assemble. You also inherit someone else’s abstractions.

For a lot of teams, especially those running a 60- to 90-day pilot, that trade-off is fine. The bigger near-term risk is building an agent stack nobody can secure, debug, or operate after the demo.

How I’d approach it

If you’re evaluating this SDK, keep the first pilot narrow.

Pick one workflow with clear boundaries and visible business value. Lock down tool access aggressively. Deny egress by default. Treat the sandbox as untrusted. Make destructive actions idempotent or approval-gated. Turn on full audit logging on day one. Keep the prompts boring. In production, reliability matters a lot more than flair.

That’s what OpenAI seems to understand with this release.

The stack around agents is finally starting to look like something a serious company could run.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof

AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

OpenAI says prompt injection may be unavoidable in AI browsers

OpenAI says the part most vendors prefer to blur: if you build an AI browser that reads arbitrary web content and takes actions for the user, prompt injection is likely a permanent security problem. That comes from the company’s December 22 write-up ...

Eightfold co-founders raise $35M for Viven, an AI digital twin for workplace knowledge

A lot of enterprise knowledge still sits in people’s heads, buried in Slack threads, scattered across docs, or trapped behind time zones. Viven wants to make that knowledge queryable. The startup, founded by Eightfold co-founders Ashutosh Garg and Va...

May Habib at Disrupt 2025 on moving AI agents into enterprise workflows

May Habib is taking the AI stage at TechCrunch Disrupt 2025 to talk about a problem plenty of enterprise teams still haven't solved: getting AI agents out of demos and into systems that actually matter. A lot of enterprise AI still looks like a chat ...