Sky is an AI interface for macOS that captures screen content, reads UI states, and performs actions across multiple applications.

Who founded Sky and what is their background?

Sky was founded by Ari Weinstein and Conrad Kramer—creators of Workflow, the precursor to Apple's Shortcuts—alongside Kim Beverett, a veteran of Apple's Safari, Messages, and SharePlay teams.

What technologies power Sky's desktop automation?

Sky uses a perception layer with ScreenCaptureKit and AXUIElement accessibility APIs and a planning layer that employs function calling and UI automation tools.

Artificial Intelligence October 25, 2025

OpenAI acquires Sky to bring AI actions directly into macOS

OpenAI has acquired Software Applications, the startup behind Sky, an unreleased AI interface for macOS that can sit above the desktop, read what’s on screen, and take actions across apps. That pushes OpenAI past the chat window and into the OS. If C...

OpenAI’s Sky deal puts the Mac at the center of the agent race

OpenAI has acquired Software Applications, the startup behind Sky, an unreleased AI interface for macOS that can sit above the desktop, read what’s on screen, and take actions across apps.

That pushes OpenAI past the chat window and into the OS. If ChatGPT is where people ask for help, Sky points to the software that can actually carry out the steps: clicking buttons, reading UI state, and moving between Xcode, Terminal, Safari, Slack, and the rest of a developer’s machine.

OpenAI also bought a team with real Apple automation history. Sky founders Ari Weinstein and Conrad Kramer previously built Workflow, which Apple acquired and turned into Shortcuts. Kim Beverett, also joining through the deal, has held senior product roles across Safari, WebKit, Privacy, Messages, Mail, Phone, FaceTime, and SharePlay. TechCrunch reports that OpenAI Head of ChatGPT Nick Turley and Applications CEO Fidji Simo led the acquisition. Terms weren’t disclosed. Sky had raised $6.5 million from investors including Sam Altman through a fund, Dylan Field, Context Ventures, and Stellation Capital.

That pedigree matters. Workflow’s founders understand something a lot of AI startups still miss: desktop automation gets useful when it hooks into stable system primitives, not when it points a vision model at pixels and hopes.

The OS layer OpenAI didn’t have

OpenAI already has strong models, multimodal input, and function calling. What it hasn’t had is a convincing desktop runtime for any of that. Browser agents help, but they’re boxed in. IDE assistants help, but they’re narrow. A Mac layer changes the product.

A system like Sky can watch active windows, inspect UI structure where macOS allows it, and act across applications. For a developer, that could mean one agent that:

reads an error in Xcode
switches to Terminal and runs a test
opens a browser tab for logs
summarizes the root cause in Slack
queues a fix for review

That sounds straightforward. In practice, this is where current AI products often fall apart. Every tool has its own state, permissions, and failure modes. A desktop agent has a real shot at stitching that mess together.

The timing also fits the broader market. Microsoft keeps pushing Copilot deeper into Windows. Apple is building out Apple Intelligence, reworking Siri, and exposing local model access through its Foundation Models framework. The fight has moved from chat UI to the action layer.

How a Mac agent like this probably works

Sky hasn’t shipped publicly, so the implementation details are inferred from the reporting. Still, the shape of the stack is fairly clear.

At the bottom is a perception layer. On macOS, that likely means some mix of ScreenCaptureKit for screen access, accessibility APIs such as AXUIElement for structured UI data, and computer vision for layouts, text, icons, and status indicators. Accessibility metadata matters a lot. Pixels alone are brittle. A model that can spot a button is useful. A system that also knows the button’s role, label, and window hierarchy is much more dependable.

Then comes the planning layer. In practice, that’s a loop:

inspect state
decide the next step
call a tool
observe the result
repeat

The tools probably include UI automation, typed text input, file access, app intents, shortcuts, deep links, and shell or network operations under policy constraints. OpenAI’s existing model stack fits neatly here. Vision models provide grounding. Function calling gives the planner structured ways to act. Smaller local models can handle quick classification or routing, while larger cloud models do the heavier reasoning.

Latency matters. Desktop agents feel broken if they stop for several seconds before every click. Users need fast feedback even when the full plan takes longer. A good system will show intent, stream intermediate steps, and cache app-specific UI maps so it doesn’t have to rediscover the same controls every time.

The best version of this product won’t rely on screen scraping unless it has to. It will prefer semantic hooks first: AppIntents, Shortcuts actions, NSUserActivity, deep links, CLI endpoints, structured APIs. That’s where the Workflow lineage matters. These founders know the gap between a flashy automation demo and something that survives OS and app updates.

The security problems aren’t edge cases

An AI that can read your screen and act on your behalf is useful. It’s also a security problem.

The biggest issue is prompt injection through the UI. If the agent reads arbitrary text from a browser tab, document, pull request comment, or chat message, that text can carry instructions meant to hijack the model. A malicious page doesn’t need kernel access. It just needs to tell the agent to copy credentials from the next window and paste them somewhere else, then hope the model obeys.

That’s one of the main reasons desktop and browser agents still feel shaky in serious environments.

A responsible architecture needs several layers of defense:

hard permission boundaries for which apps and windows the agent can access
action allowlists, especially for Terminal, browsers, payments, admin settings, and production tooling
provenance checks so the system can distinguish trusted UI from untrusted web content
confirmation gates for high-risk actions such as deploys, deletes, transfers, and account changes
detailed logs so users and security teams can see what the agent did

Apple’s privacy stance matters here too. macOS already treats screen recording and accessibility access as sensitive permissions. If OpenAI pushes agents further into the desktop, Apple will likely tighten those controls and probably add new APIs built for safer AI action. That would make sense. The current permission model wasn’t built for software that continuously watches, reasons, and acts.

Enterprise buyers will care even more. SecOps teams aren’t going to accept “the user approved it” as a complete answer when an AI agent touches customer data, source code, or production systems. Auditability and policy control stop being optional at that point.

This puts pressure on Apple

Apple has the home field advantage on macOS, but it moves carefully, especially around privacy and system behavior. OpenAI moves faster and tends to tolerate more ambiguity while a product is taking shape.

That creates an opening. If OpenAI ships a Mac agent that feels genuinely useful before Apple has a polished native answer, it can shape expectations early.

Still, Apple controls the OS. It can favor its own frameworks, push developers toward sanctioned automation paths, and make life difficult for broad screen-level agents if it wants to. A third-party agent on macOS always lives inside Apple’s rules. OpenAI can build a strong layer on top, but it doesn’t control the substrate.

That’s why this deal is more interesting than another model release. It sits right at the intersection of product design, systems architecture, and platform politics.

Developers should read this as a platform shift

If you build Mac software, or any app that desktop agents may touch, the takeaway is simple: give the agent better handles than your UI.

Structured automation surfaces are now a product feature. They also make your own integrations and test tooling less fragile.

The priorities are familiar, but they matter more now:

Expose AppIntents for common actions.
Add Shortcuts support where it makes sense.
Offer deep links and NSUserActivity for restoring context.
Keep a documented CLI or local API for headless workflows.
Return machine-readable results and explicit errors.
Add dry-run modes and confirmation gates for destructive operations.

If your app forces the agent to infer everything from screenshots and button positions, it will fail in messy ways. If your app exposes typed actions and predictable state, the agent has a real chance of being fast, safe, and useful.

There’s a defensive angle too. Start thinking about redaction, trusted UI regions, and provenance signals. If your interface shows secrets, customer records, or internal tokens, assume an OS-level agent may eventually be looking at that screen. You’ll want ways to mark sensitive areas, suppress them in captures, or at least make their status explicit.

For developer tools especially, this could become a dividing line. The ones that cooperate with agents will be easier to use, easier to automate, and easier to govern. The ones that don’t will still work, but they’ll feel clumsy next to software that exposes clear actions to the machine sitting above the desktop.

OpenAI bought a path into daily computer use, right where user intent turns into clicks, commands, and mistakes. That’s a powerful position if the company can make the agent competent enough to trust and constrained enough to hold up on real machines.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof

AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

OpenAI's Codex update moves from code generation to desktop automation

OpenAI’s Codex update on April 16 matters because it pushes the product beyond code generation and into direct execution on a user’s machine. The new features are clear enough. Codex can now control macOS apps in the background, use a built-in browse...

Google DeepMind's SIMA 2 uses Gemini for goal-directed action in games

Google DeepMind’s new SIMA 2 research preview matters because it pushes AI agents beyond scripted instruction-following demos and closer to usable autonomy inside interactive environments. The headline is straightforward. SIMA 2 combines Gemini’s rea...

Astropad Workbench targets AI agent oversight on macOS, not remote IT

Astropad has a new app called Workbench, and the pitch is narrower than it sounds. It’s remote desktop aimed at people supervising AI agents on Macs. Not help desks. Not gamers chasing frame times. That focus makes sense. A lot of agent workflows sti...