OpenAI publishes open source teen safety prompts for age-aware AI apps
OpenAI has published an open source set of teen safety policy prompts on GitHub, built with Common Sense Media and everyone.ai, to help developers add age-aware guardrails to AI apps faster. The release targets teams building chatbots, tutors, creati...
OpenAI’s new teen safety prompts give developers a solid starting point
OpenAI has published an open source set of teen safety policy prompts on GitHub, built with Common Sense Media and everyone.ai, to help developers add age-aware guardrails to AI apps faster. The release targets teams building chatbots, tutors, creative tools, and general-purpose assistants that may end up serving minors.
That matters because a lot of teams still handle youth safety with a pile of moderation calls, vague system prompts, and product rules scattered across the stack. OpenAI is trying to make that more legible. The prompts cover graphic violence and sexual content, harmful body ideals and behaviors, dangerous challenges, romantic or violent roleplay, and age-restricted goods and services. They’re model-agnostic in principle, but tuned for OpenAI’s open-weight safety model, gpt-oss-safeguard.
It’s a useful release. It also leaves plenty of work on the product team. Prompt packs won’t fix weak age detection, sloppy product design, or a model that folds after four turns of pressure. But for developers who need a defensible baseline, this is one of the more practical safety releases OpenAI has put out.
Why this matters
The hard part of safety work usually isn’t agreeing on the policy. It’s turning policy into rules a system can actually enforce.
That’s where teams get into trouble. “Protect minors” sounds fine in a policy doc. It gets murkier when a user asks for calorie restriction advice, romantic roleplay involving a teen character, or instructions for stealing alcohol. That gap between principle and implementation is where a lot of AI products break down.
OpenAI’s prompt pack tries to close some of that gap by publishing policy in a form engineers can drop into real request flows. That’s the interesting part. The company is treating policy-as-prompt as an implementation layer instead of leaving it as a trust-and-safety talking point.
For startups and smaller product teams, that can save real time. For bigger companies, it creates a shared artifact that legal, policy, and engineering can all inspect without translating for each other.
It also fits OpenAI’s broader push around under-18 safety over the past year, including parental controls, age prediction work, and changes to its Model Spec around teen behavior.
What the stack looks like
If you’re already running moderation, none of this is especially exotic. What OpenAI has done is package a specific workflow.
A sensible implementation looks like this:
-
Route by age context Use whatever age signal you have: account metadata, session history, parental controls, inferred age, device context. Don’t trust self-declared age on its own.
-
Pick a policy prompt Apply a teen policy, mixed-audience policy, or adult policy based on confidence and risk.
-
Screen the input Run
gpt-oss-safeguardor another classifier on the incoming message before it reaches your main model. -
Generate with constrained instructions Merge the app’s system prompt with the selected safety policy. Keep it short. Prompt sprawl adds latency and cost fast.
-
Screen the output Treat model output as untrusted. Run post-generation moderation before the final response goes back to the user.
-
Fallback cleanly If something crosses a threshold, swap in a refusal template, offer a safer alternative, or route to human review for high-risk categories like self-harm.
That structure makes sense. It’s the kind of boring you want in safety infrastructure.
Teams that skip post-generation checks usually regret it. So do teams that dump everything into one giant system prompt and hope for the best. Failures often show up after the model starts improvising around your intent.
gpt-oss-safeguard is the part engineers should pay attention to
The prompt pack is useful, but the open-weight moderation model is what makes this operationally interesting.
OpenAI says gpt-oss-safeguard is optimized for content classification and refusal guidance. Because it’s open-weight, teams can run it privately. That matters for compliance-sensitive apps, especially anything involving teens, health-adjacent content, or schools.
It changes the trade-offs.
If moderation stays inside your own infrastructure, you get tighter data control and less vendor dependence. You also get room to tune thresholds by product surface. A teen tutoring app, a roleplay product, and a social chatbot shouldn’t all use the same sensitivity settings.
There’s still no free pass here. High-recall moderation catches more risky content, but it also blocks more legitimate use. That gets painful fast in educational contexts. A biology tutor should be able to answer questions about eating disorders, abuse, puberty, or sexual health without sliding into harmful guidance or blanket refusal. That balance takes product-specific evaluation, not just policy text.
Open weights help. They don’t remove the judgment call.
Prompt-based policy has limits
There’s a reason companies keep landing on prompt-level policy. It’s portable, readable, and easy to revise. You can test it, diff it, and argue about it in plain language. That’s better than burying safety logic in hidden heuristics spread across services.
But the limits are obvious.
First, it adds overhead. OpenAI’s own guidance suggests safety prompts can add roughly 200 to 600 tokens. That’s not small in a high-volume chat product. Add pre- and post-moderation calls and you’ve introduced both cost and latency. Depending on your setup, two moderation passes can add around 100 to 300 milliseconds. People notice that in chat.
Second, prompts are visible abstractions. Attackers can study them, probe the edges, and build jailbreaks against them. Open publication makes outside scrutiny easier, which is good. It also gives adversaries cleaner targets.
Third, prompt rules weaken under conversational pressure. A single-turn moderation benchmark can look fine while the product still fails in multi-turn sessions. A user who starts with “write a story” and spends three exchanges steering toward sexual content involving a teen persona is testing the whole system, not one classifier pass.
That’s why regression testing matters more than prompt elegance. If you’re shipping this, build or borrow an adversarial suite and run it in CI. Track attack success rate across turns. A lot of failures show up after turn three.
Where teams should be careful
The obvious mistake is treating “teen safe” as one setting.
It isn’t. A writing app, coding assistant, language tutor, and social companion all need different enforcement patterns, even if they share the same taxonomy.
A few design choices matter more than the prompt pack itself.
Policy routing by product surface
If your product has multiple modes, split them. Free-form roleplay needs stricter handling than factual Q&A. Creative writing often needs special treatment because a refusal can accidentally leak unsafe details through “helpful” alternatives.
Output templates for risky categories
When something sensitive gets flagged, don’t always let the main model freestyle the refusal. Use guarded templates. Short, respectful, and boring works better than over-explaining.
Localized crisis resources
If you’re responding to self-harm or substance abuse content, generic advice won’t do. Region-specific resources matter, and your resolver should have an offline fallback. Products tend to fail exactly where teams assume external lookups will always work.
Data retention
This part gets underplayed. If teens may be using your app, don’t keep raw conversation logs by default. Minimize retention windows, hash what you can, encrypt the rest, and be clear about what ends up in evaluation datasets.
Regulatory pressure is part of the story
OpenAI didn’t publish this in a vacuum. Services that reach minors are under tighter scrutiny from several directions: the EU’s Digital Services Act, the UK’s Online Safety Act, and continuing pressure in the US around youth platform protections, including proposals like KOSA.
A shared policy artifact helps with audits. It gives companies something concrete to point to when regulators, partners, or school customers ask how the system behaves for under-18 users. That doesn’t prove the system is safe, but it does make the safety layer easier to inspect.
That’s overdue. A lot of AI safety tooling still depends on opaque filters, scattered product decisions, and institutional lore. Publishing policy logic is healthier than asking people to trust a brand promise.
OpenAI’s credibility problem is still there
This release is useful, but it lands in a context OpenAI hasn’t escaped. The company has faced criticism and legal pressure over cases where ChatGPT interactions were linked to serious harm, including suicide-related allegations. That doesn’t make every new safety tool cosmetic. It does mean people are right to ask whether prompt packs are enough.
They aren’t.
Teen safety is a systems problem. It runs through account design, escalation paths, model behavior, data handling, parental controls, red-teaming, and product incentives. If your product rewards endless engagement and emotionally intense interaction, a clean refusal template won’t cover for that.
Still, having a public baseline beats improvising one badly in production.
What I’d do with this
I’d use the policy prompts as a baseline, then narrow them hard for the product.
I’d run gpt-oss-safeguard before and after generation. I’d default to teen-safe behavior anywhere age is uncertain. I’d create fixed response templates for self-harm, sexual content involving minors, dangerous challenges, and body-image content. I’d measure false positives on legitimate educational queries, because overblocking is how safety features become unusable. And I’d test multi-turn jailbreaks, not just isolated prompts.
Most of all, I’d treat the model’s output as suspect until proven otherwise. That’s still the right mental model.
OpenAI’s new prompt pack won’t make a product safe on its own. It does make one part of the job faster, clearer, and easier to audit. For engineering teams working on a deadline, that’s enough to matter.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Add focused AI, data, backend, and product engineering capacity when the roadmap is clear.
How an embedded engineering pod helped ship a delayed automation roadmap.
OpenAI says the Department of Defense will be able to use its models on classified networks, with technical safeguards that OpenAI keeps in place. Sam Altman framed the deal around two boundaries: no domestic mass surveillance, and no handing lethal ...
Moonbounce, a startup founded by former Facebook and Apple trust and safety leader Brett Levenson and Ash Bhardwaj, has raised $12 million to sell a specific piece of infrastructure: a real-time moderation layer that sits between users and AI systems...
Wikipedia’s editors have published something the AI detection industry keeps missing: a practical guide to spotting LLM-written prose that people can actually use. The page is called Signs of AI writing. It grew out of Project AI Cleanup, a volunteer...