Artificial intelligence June 5, 2026

Anthropic embeds engineers at NSA to deploy Mythos for cyber operations

Anthropic has reportedly sent roughly half a dozen engineers to the National Security Agency to help the agency use Mythos, the company’s frontier cybersecurity model. That detail, first reported by the Financial Times and cited by TechCrunch, matter...

Anthropic embeds engineers at NSA to deploy Mythos for cyber operations

Anthropic’s Mythos is reportedly headed deeper into NSA cyber work

Anthropic has reportedly sent roughly half a dozen engineers to the National Security Agency to help the agency use Mythos, the company’s frontier cybersecurity model.

That detail, first reported by the Financial Times and cited by TechCrunch, matters because Mythos is a cyber-focused AI system that Anthropic has previously described as powerful enough to restrict. The NSA is also an unusually sensitive customer. It collects foreign intelligence through wiretaps, undersea cable access, corporate partnerships, and other classified channels, and it runs offensive cyber operations against foreign targets.

The reporting doesn’t establish whether Anthropic engineers or Mythos are directly involved in active hacking operations. The NSA declined to confirm or deny the report. Anthropic did not respond to TechCrunch’s request for comment.

The outline is still uncomfortable: a frontier AI lab that has publicly warned about misuse of a cyber model is reportedly placing staff with the US signals intelligence agency to help put that model to work.

Why Mythos matters

Most developers already understand how useful LLMs can be for security-adjacent work. They can explain CVEs, summarize exploit write-ups, generate detection logic, translate malware snippets between languages, or draft proof-of-concept code. Plenty of that is available today in general-purpose coding models.

Mythos appears to be built for heavier cyber work.

Anthropic has said it limited access to Mythos because of its ability to find security flaws and assist with cyber operations. That points to a model trained or tuned for tasks such as vulnerability discovery, exploit reasoning, threat analysis, reverse engineering, and possibly chaining weaknesses across systems.

The difference matters. A general model might help a security engineer understand why a deserialization bug is dangerous. A specialized cyber model may be better at:

  • identifying vulnerable code paths across large repositories
  • turning a bug into a working exploit strategy
  • reasoning through privilege escalation steps
  • mapping attack surfaces from documentation, configs, and logs
  • producing detection queries for SIEM or EDR systems
  • triaging noisy scan output into useful leads

Mythos should not be treated as a magic exploit machine. Models still hallucinate, misunderstand runtime behavior, miss environmental constraints, and produce brittle code. Exploitation often depends on exact versions, memory layout, compiler behavior, mitigations, network topology, credentials, and luck. But an imperfect model can still change the economics of cyber work if it cuts hours from reconnaissance, triage, or exploit development.

That’s enough to explain why governments want access.

The procurement mess

The Mythos report lands in the middle of a strange fight between Anthropic and parts of the US government.

Axios reported in April that the NSA was using Mythos despite a federal ban on using Anthropic’s technology. That ban followed the Department of Defense’s move to designate Anthropic a “supply chain risk,” reportedly in retaliation for the company refusing to allow its models to be used for mass domestic surveillance and autonomous weapons.

If those reports are accurate, the situation is messy even by federal AI standards. One part of the government labels the vendor a supply chain risk. Another reportedly uses the vendor’s most sensitive cyber model anyway. Now the FT says Anthropic engineers are helping the NSA use Mythos.

There are possible explanations that don’t require conspiracy. National security waivers exist. Classified exceptions exist. Procurement rules often lag operational demand. Agencies can disagree about risk. But for developers and security leaders watching the AI market, the contradiction is useful: policy positions around frontier AI can soften once a tool becomes operationally attractive.

That applies outside government too. Companies publish careful acceptable-use policies, then face pressure from large customers who want exceptions. Cybersecurity products sharpen that tension because the same capability can support defense, red teaming, surveillance, or intrusion.

Cyber models are built for dual use

A model that can find vulnerabilities can help patch them. It can also help exploit them.

That’s the dual-use problem in AI security, and it’s not hypothetical. A defensive team auditing an internal service can use a workflow that looks a lot like an offensive actor scanning third-party systems:

  1. ingest code, docs, configs, and exposed service metadata
  2. identify likely weak points
  3. generate hypotheses about exploitability
  4. test those hypotheses
  5. refine based on errors and system responses
  6. package the result into a usable report or exploit chain

The quality of each step matters. Current models are uneven, but they’re already useful in the middle of that loop. They’re good at reading large volumes of semi-structured technical material. They’re good at summarizing. They’re good at generating plausible code and queries. They’re getting better at tool use, especially when connected to scanners, debuggers, fuzzers, symbolic execution tools, cloud APIs, and CI systems.

That tool integration is where the practical impact shows up. A cyber model in a chat window has limited value. A cyber model connected to nmap, Burp Suite, CodeQL, Semgrep, Ghidra, Kubernetes audit logs, cloud IAM graphs, and a ticketing system becomes much more useful. It can coordinate tools that already exist.

For an agency like the NSA, the value may come from accelerating expert teams rather than replacing them. A model could help analysts sift intercepted traffic, correlate infrastructure, summarize malware families, inspect firmware, generate YARA or Sigma rules, or search for exploit paths in foreign systems. Small productivity gains matter when the target set is large and the talent pool is limited.

The human-in-the-loop detail matters

The FT report says Anthropic sent engineers to help the NSA use Mythos. That detail matters because advanced AI deployments rarely work by handing over an API key.

For a cyber model to be useful inside an intelligence agency, someone has to solve hard integration problems:

  • access controls for classified or compartmented data
  • audit logging that doesn’t leak sensitive operations
  • model hosting constraints, including air-gapped or government-controlled environments
  • prompt and tool policies for offensive versus defensive tasks
  • evaluation datasets that reflect real mission work
  • guardrails that don’t block legitimate classified use
  • incident response if the model produces unsafe or false output

Embedding engineers suggests Mythos needs customization, operational tuning, or workflow design. That’s normal. It also means vendor staff may end up closer to sensitive government use cases than public policy language suggests.

There’s risk on both sides. Anthropic gains technical feedback from elite cyber operators, which could improve the model. It also risks being pulled into uses that conflict with its stated safety posture. The NSA gains access to frontier tooling, but it also takes on vendor dependency in a domain where reliability, provenance, and control matter.

A model that suggests a bad exploit path wastes time. A model that mishandles classified inputs creates a different problem. A model that confidently misattributes infrastructure could send analysts in the wrong direction. In cyber operations, false confidence is expensive.

What engineers should watch

For senior developers, security engineers, and technical leads, the Mythos story points to where AI security tooling is going.

The first wave of AI coding tools focused on autocomplete, code generation, and documentation. The next wave is moving into specialized operational work: vulnerability research, incident response, malware analysis, secure code review, cloud posture analysis, and attack simulation.

That will show up in enterprise products soon, if it hasn’t already. Vendors will sell AI security agents that can open pull requests, prioritize vulnerabilities, explain exploitability, generate tests, and plug into CI/CD. Some will be useful. Some will be thin wrappers around general models with scary dashboards.

Technical buyers should ask harder questions than “does it use AI?”

Useful questions include:

  • What data does the model need to inspect?
  • Can it run in a private environment?
  • Are prompts, outputs, and tool calls logged?
  • Can logs be disabled, retained, or exported for audit?
  • How does the system handle secrets, credentials, and customer data?
  • Does it generate proof-of-concept exploit code?
  • Can policies distinguish between internal assets and third-party targets?
  • What evaluation data supports the vendor’s accuracy claims?
  • How often does it produce false positives or unsafe recommendations?
  • Can humans approve tool actions before execution?

That last point is boring and important. Autonomous security agents sound attractive until they start running scanners against production, changing firewall rules, or filing hundreds of low-quality tickets. In most organizations, the near-term sweet spot is assisted analysis with strong review points, not free-running agents.

Performance matters too. Cyber workflows involve large repositories, long logs, binary artifacts, packet captures, and sprawling cloud configs. A model’s context window is only part of the story. Retrieval quality, indexing latency, permissions filtering, and tool orchestration often decide whether the product works at scale. If a system can’t respect repo boundaries, tenant isolation, or least-privilege access, it’s not ready for serious security work.

The policy line is harder in cyber

Anthropic has built much of its brand around AI safety, controlled deployment, and refusal policies. That doesn’t prevent it from serving government customers, but Mythos puts the company in a tighter spot.

Cybersecurity is one of the hardest areas for clean policy. Refusing “offensive” use sounds clear until a government agency says the same capability is needed to understand adversary infrastructure, validate defenses, or preempt attacks. Red teams write exploits. Defenders simulate intrusions. Intelligence agencies do both, along with activities companies usually avoid discussing in public.

The line between defensive research and offensive preparation often sits in the authorization, target, and intent, not in the technical artifact. A buffer overflow analysis doesn’t reveal whether it supports a patch, a detection rule, or an intrusion campaign.

That ambiguity doesn’t excuse weak governance. It raises the standard. Frontier cyber models need clear access controls, monitored tool use, strong evaluation, and real accountability around deployment contexts. “Trust us” isn’t enough from an AI lab or an intelligence agency.

The public record still has gaps. We don’t know which Mythos capabilities the NSA is using. We don’t know whether the model is deployed in classified environments, connected to operational tooling, or limited to analysis and training. We don’t know how Anthropic reconciles this work with the reported federal ban or the Pentagon’s earlier supply-chain designation.

Those details matter.

For now, the signal is clear enough: frontier AI labs and intelligence agencies are moving from abstract discussions about cyber capability into hands-on deployment. Developers won’t see the NSA’s workflows, but they’ll see the commercial echo. AI security tools are going to get sharper.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service
AI model evaluation and implementation

Compare models against real workflow needs before wiring them into production systems.

Related proof
Internal docs RAG assistant

How model-backed retrieval reduced internal document search time by 62%.

Related article
Anthropic previews Mythos, an AI model for finding zero-day vulnerabilities

Anthropic is previewing a new model called Mythos for a job with real stakes: finding software vulnerabilities before attackers do. The company says Mythos has already found thousands of zero-day vulnerabilities, including bugs in codebases that have...

Related article
The security startups from Startup Battlefield that actually track new attack surfaces

TechCrunch’s Startup Battlefield surfaced a useful cluster of security companies this week, and the pattern is clear. The better ones aren’t slapping AI onto old product categories. They’re built around a simpler fact: models, agents, and synthetic m...

Related article
NSA reportedly uses Anthropic Mythos Preview for vulnerability discovery

The Pentagon drama around Anthropic is getting the headlines. The more important detail is that the NSA is reportedly already using a restricted frontier model for vulnerability discovery. Axios reported that the NSA has access to Mythos Preview, Ant...