What is the ChatGPT Apps SDK?

A beta SDK that lets developers define structured actions and integrate services into ChatGPT conversations using JSON Schema.

How are ChatGPT apps discovered within the interface?

Users can browse a new app directory or rely on the model to invoke relevant apps directly during a chat turn.

What are best practices for building reliable ChatGPT apps?

Keep actions narrow, use enums and strict schemas for parameters, and optimize API latency for smooth conversational UX.

Generative AI December 20, 2025

OpenAI opens ChatGPT app submissions and expands in-product app discovery

OpenAI has opened submissions for a ChatGPT app directory and is rolling out app discovery inside ChatGPT’s tools menu. Its new Apps SDK, still in beta, gives developers a formal way to plug services into ChatGPT so the model can call them during a c...

OpenAI’s ChatGPT app store is live, and developers should treat it like a new front end

OpenAI has opened submissions for a ChatGPT app directory and is rolling out app discovery inside ChatGPT’s tools menu. Its new Apps SDK, still in beta, gives developers a formal way to plug services into ChatGPT so the model can call them during a conversation.

That creates a new distribution channel, but it also changes how software gets used. In a normal product flow, users browse a UI, click buttons, fill forms, and decide what happens next. In ChatGPT, the model does more of the routing. It decides when your app is relevant, which action to call, and what parameters to send. Your API becomes something the assistant picks mid-task.

If you build SaaS, internal tools, data products, or workflow software, that shift matters now.

Why this matters more than another plugin cycle

Platform stores aren't new. Apple had apps. Slack had integrations. Alexa had skills. OpenAI has already worked through plugins, tools, and function calling. It's easy to shrug and file this under the same pattern.

That misses what changed.

A ChatGPT app is a callable capability inside an active conversation. Users don't have to hunt through menus. They say, "find me a two-bedroom apartment near downtown under $3,000," or "turn this outline into a slide deck," and the system decides whether Zillow, Canva, Spotify, Expedia, or your product should handle part of the request.

That changes product design in a few obvious ways:

discovery depends partly on the directory, but also on whether the model understands your app well enough to invoke it
action design matters more than surface polish
latency gets judged inside a chat turn, not a standalone app session
vague APIs turn into a problem fast

OpenAI previewed apps from Expedia, Spotify, Zillow, and Canva back in October. Opening submissions more broadly is the signal. This has moved past partner demos. OpenAI wants an ecosystem.

What the SDK probably looks like in practice

OpenAI hasn't published a full public spec in the source material, but the shape is familiar from earlier tool-calling systems.

You define actions with structured inputs and outputs. Think JSON Schema, not loose prompts. ChatGPT decides when an action fits, fills the parameters as best it can, calls your backend, and folds the result into the conversation.

A simple action might look like this:

{
"name": "create_slide_deck",
"description": "Generate a 10-slide presentation from an outline",
"parameters": {
"type": "object",
"properties": {
"outline": { "type": "string", "minLength": 10 },
"brand_theme": { "type": "string", "enum": ["default", "dark", "light"] },
"export_format": { "type": "string", "enum": ["pptx", "pdf"] }
},
"required": ["outline", "export_format"],
"additionalProperties": false
}
}

That will look routine if you've shipped tool-calling against OpenAI, Anthropic, or Google APIs. It's also where plenty of apps will break.

Loose schemas force the model to guess. Guessing leads to bad parameters, unnecessary follow-up questions, and flaky UX. If you want reliability, narrow the space. Use enums. Set bounds. Reject junk cleanly. Don't try to expose your entire product as one giant action.

The best early ChatGPT apps will probably be narrow and opinionated. A few sharp actions. Tight contracts. Predictable responses.

Getting called is the easy part

A lot of teams fixate on invocation and neglect everything after it.

Once ChatGPT starts calling your service inside a user-facing turn, ordinary backend discipline matters even more.

Latency is exposed

If the model spends a few seconds reasoning and your API spends a few seconds responding, users feel the whole wait. There's no separate page load to hide behind. OpenAI hasn't published a universal limit in the source material, but if you're north of a second at p95 for a common action, it'll probably feel slow.

For practical purposes:

target sub-800ms p95 where you can
avoid cold starts
cache hot lookups
precompute results for frequent, narrow tasks
use async patterns for long-running jobs

If a task takes longer, don't pretend it's synchronous. Return a job_id, stream updates, or let the system poll. A slow action inside a chat loop gets annoying fast.

Your API has to handle ambiguity

Models fill arguments probabilistically. Even with strong schemas, you'll get missing values, malformed dates, contradictory locations, and cases where the human intent was clear but the action payload wasn't.

You need a structured way to say: I can't do this yet, ask for clarification.

A stable error taxonomy helps:

USER_AUTH_REQUIRED
NEEDS_CLARIFICATION
RATE_LIMIT
TEMPORARY_UNAVAILABLE

That's a lot better than returning an HTTP 400 with a paragraph of internal diagnostics. The model can work with readable categories. A stack trace is useless.

Idempotency matters

If your app books travel, places orders, sends invoices, or touches money, retries are dangerous. LLM systems retry. Networks retry. Platforms retry.

Use request IDs. Make create operations idempotent. Don't let a chat hiccup create two charges or two reservations.

Obvious, yes. Still easy to miss.

Security doesn't get softer because the UI feels conversational

The chat interface feels casual. The security requirements don't.

OpenAI is expected to use OAuth 2.0, likely with PKCE, for user consent. That's standard. Scope design is where teams get sloppy. If your app only needs read access for one action, don't ask for full account access because it's easier to wire up.

Prompt injection is the other issue everyone brings up, and in this case the concern is justified. If your app accepts free-form text from the model and passes it straight into brittle downstream systems, you've created a decent exploit path.

Treat all model-supplied text as untrusted input. Validate against schemas. Apply server-side policy checks. Escape dangerous content where needed. Never execute commands just because the model phrased them confidently.

Watch your outputs too. If ChatGPT may show your response directly to the user, don't return internal IDs, hidden fields, or secrets that were meant only for your service layer.

Data minimization matters because the platform is mediating context. Ask for location if the action needs location. Ask for files if it needs files. Don't ask for broad data access because your SaaS app usually gets broad permissions.

Observability becomes product infrastructure

If your app starts getting real usage inside ChatGPT, debugging gets weirder.

The user sees one conversation. Under the hood, you may have:

model reasoning
tool selection
auth handoff
your API gateway
queues
external vendor calls
a final model response

Without solid tracing, support turns into guesswork. Correlation IDs are table stakes. Log the action name, schema version, validation failures, auth state, and latency at each hop. Build replayable eval sets from real prompts so you can see where the model misfills parameters or calls the wrong action.

This is where mature teams will separate themselves from demo builders. The directory will reward products that behave consistently, not products with the slickest launch video.

The business side is fairly straightforward

If your users already spend time in ChatGPT, this is a real distribution opportunity. Maybe not for every product, but for a lot more than travel and consumer search.

The sweet spot looks like this:

high-intent tasks
short workflows
clear outcomes
APIs that already exist
value delivered in one or two calls

Think procurement, CRM lookups, internal knowledge actions, reservation systems, coding helpers, analytics summaries, invoice creation, or document generation.

Enterprise buyers will start asking vendors whether they have a ChatGPT app for the same reason they asked about Slack integration or SSO. Sometimes that's checkbox stuff. Sometimes it's real demand. Either way, product teams will need an answer.

The bigger pressure point is cross-assistant support. Microsoft Copilot, Google Gemini, and OpenAI are all moving toward assistant-mediated tool ecosystems. Nobody wants to maintain three completely different integration stacks forever. Expect adapter layers and abstraction tooling to become a real category in 2026.

Monetization is still hazy. OpenAI hasn't detailed revenue share or paid placement in the source material. For now, assume the incentive is usage and distribution, not direct store economics. That may change quickly if the directory starts sending meaningful traffic.

What developers should do in the next 60 days

Don't port your whole app. Pick a narrow slice that works well in conversation.

A good first version usually has:

one to three high-value actions
strict schemas
compact structured responses
short summaries the model can quote back to the user
solid auth and traceability
fast failure paths when context is missing

If you're building from scratch, start with workflows where a user would naturally type a request instead of opening a dashboard. That's the filter.

If you already support function calling in your stack, this won't feel alien. The difference is distribution. OpenAI is offering placement inside one of the biggest AI interfaces on the market, and that matters.

It also means your API is no longer sitting behind your UI and its careful step-by-step flow. It's exposed to a model that improvises. Build for that.

What to watch

The caveat is that agent-style workflows still depend on permission design, evaluation, fallback paths, and human review. A demo can look autonomous while the production version still needs tight boundaries, logging, and clear ownership when the system gets something wrong.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI model evaluation and implementation

Compare models against real workflow needs before wiring them into production systems.

Related proof

Internal docs RAG assistant

How model-backed retrieval reduced internal document search time by 62%.

OpenAI inside ChatGPT raises a harder question for Apple's AI strategy

OpenAI’s move to let third-party apps run inside ChatGPT brought back an old idea: the app icon may not matter much if one assistant window can handle travel, playlists, shopping, and work. If that shift sticks, the home screen stops being the main w...

OpenAI launches ChatGPT Pulse for personalized morning briefs

OpenAI has launched ChatGPT Pulse, a feature that builds personalized morning briefs overnight and drops them into the ChatGPT app as a set of cards. For now, it’s limited to the Pro tier, which suggests two things: OpenAI thinks it matters, and it p...

OpenAI launches ChatGPT Agent for multi-step planning, tool use, and app actions

OpenAI has launched ChatGPT Agent, a general-purpose agent mode inside ChatGPT that can plan multi-step tasks, use external tools, run code, browse the web, and take actions across connected apps including Gmail, Google Calendar, GitHub, Slack, and T...