What differentiates AI red teaming from traditional penetration testing?

AI red teaming uses autonomous agents to simulate multi-step attacks involving prompts, tool calls, and model sessions, unlike manual pentests focused on known vulnerabilities.

How can secure coding agents reduce noisy findings for developers?

By using repository context and semantic analysis to propose patches that compile, pass tests, and respect framework conventions.

Why is policy-as-code important for AI security?

It enables enforceable controls around prompt content, tool permissions, and session roles, similar to Kubernetes and cloud IAM admission policies.

Artificial Intelligence December 26, 2025

The security startups from Startup Battlefield that actually track new attack surfaces

TechCrunch’s Startup Battlefield surfaced a useful cluster of security companies this week, and the pattern is clear. The better ones aren’t slapping AI onto old product categories. They’re built around a simpler fact: models, agents, and synthetic m...

Nine cybersecurity startups show where AI security is actually going

That changes what defenders need to watch.

The nine startups drawing attention are AIM, Corgea, CyDeploy, Cyntegra, HACKERverse, Mill Pond, Polygraf AI, TruSources, and Zest. The categories vary, but the themes line up: AI red teaming, secure coding agents, digital twins for infrastructure, ransomware recovery, unmanaged AI discovery, smaller security-tuned models, deepfake detection, and cloud risk correlation.

For senior engineers and security leads, the startup list matters less than the shift under it. Security teams now need controls that understand prompts, tool calls, model sessions, and media authenticity alongside the usual CVEs, IAM drift, and endpoint telemetry.

Security is getting model-aware

A few years ago, “AI for security” mostly meant alert triage, anomaly detection, or better phishing filters. Useful enough, but familiar.

This batch points somewhere else. AIM and HACKERverse focus on active validation. They use autonomous attack agents to probe enterprise systems, including AI systems, for weak guardrails, exploitable tool access, and multi-step attack paths. That fits the way modern attacks actually work. Attackers chain prompts, identities, APIs, SaaS connectors, and cloud permissions.

If you run internal copilots, retrieval pipelines, or agentic workflows, this is immediate. Traditional AppSec tools won’t tell you much about a prompt injection that tricks an agent into calling process.exec, pulling secrets from a connected tool, or pivoting through an over-permissioned plugin.

That helps explain why the model gateway is turning into a real control point. Expect policy checks around:

prompt content
tool-use permissions
session roles
data classification
token- and request-level telemetry

The source material’s OPA/Rego example is simple, but directionally right. Blocking fs.write or process.exec unless a session is explicitly privileged is the kind of blunt, enforceable control that works better than vague “be safe” prompt instructions. Security teams will need more of that. Policy-as-code for AI systems looks headed for the same maturity curve already seen with Kubernetes, cloud IAM, and admission controllers.

Corgea and the hard part of AI coding tools

Corgea is working on the development side, using AI agents to find code flaws and propose fixes. Crowded space, yes. The hard part is still real.

Most code security tooling still leads to one of two bad outcomes: noisy findings developers ignore, or clean-looking fixes that break application logic. Secure coding assistants only matter if they understand framework conventions, auth flows, state transitions, and the codebase they’re touching. Finding weak crypto or a missing CSRF check is table stakes. Producing a patch that compiles, passes tests, and doesn’t quietly wreck authorization is harder.

That’s why repository context and semantic analysis matter more than model size. A security coding agent that understands your OAuth2/OIDC setup, middleware chain, and deployment assumptions is far more useful than a generic assistant that produces “sanitized” pseudocode.

There’s still an obvious risk. AI-generated security fixes can introduce subtle regressions, especially in older systems with strange invariants. Teams using tools like this should treat them as fast review assistants, not auto-merge machines. Tight CI, security tests, and code ownership rules still carry the load.

Digital twins, used properly

CyDeploy stands out because “digital twin” is usually abused until it means nothing. Here, it’s practical.

If the system can discover assets across networks, containers, services, and SaaS dependencies, then map those relationships as a graph, you get a safe place to test changes before they hit production. Patch rollout. Network policy updates. Segmentation changes. Failover paths. Plenty of security incidents start with somebody breaking production while trying to secure it.

The plumbing matters. Agentless discovery through network sensors or eBPF, then graphing identities and dependencies, gives you a model that’s live enough to rehearse against. It won’t perfectly mirror production. Nothing does. But it can catch obvious foot-guns before they knock over a revenue service at 2 a.m.

This is a good example of security and platform engineering converging for sane reasons. In a large enough environment, change risk is a security problem.

Cyntegra is betting on recovery

Cyntegra’s pitch is ransomware resilience through hardware plus software, with isolated backups and fast full-system restoration. Good. More security companies should care about recovery time instead of pretending perfect prevention is available for purchase.

A useful ransomware product in 2026 needs a few things:

immutable or signed snapshots
isolation from the primary OS
protected credential recovery
a restore path an attacker on the main system can’t tamper with

If Cyntegra can actually restore the OS, apps, data, and credentials in minutes, that’s compelling. Fast, clean restoration breaks the economics of ransomware.

The trade-off is operational complexity. Hardware roots of trust, out-of-band recovery channels, and tightly controlled restore workflows can be painful in mixed environments. Legacy infrastructure always ruins the clean architecture slide. Still, resilience is finally getting the attention detection got for the last decade.

Shadow AI is now an enterprise category

Mill Pond is focused on unmanaged AI use by employees, and that market is going to grow whether anyone likes the label or not.

Every company already has some version of this problem. Engineers paste logs into public models. Sales teams upload customer notes. Finance asks a chatbot to summarize contract language. Sometimes nothing sensitive leaks. Sometimes source code, PII, or internal prompts end up in systems with weak policy coverage.

Detection will probably come from DNS logs, proxies, endpoint telemetry, and known AI endpoint catalogs. The harder problem is classification. Blocking all model use is lazy and usually ineffective. Companies need to separate harmless drafting from work that touches secrets, regulated data, customer records, or proprietary code.

That means these tools need to plug into the systems security teams already use, including DLP, CASB, and SIEM pipelines. If they don’t, they become another dashboard nobody opens.

Small models for security make sense

Polygraf AI is building small language models tuned for security tasks. Less flashy than a giant general model, which may be the point.

Security work often values predictability, low latency, local inference, and data control over broad creative range. A smaller model trained for policy parsing, log triage, compliance checks, or deepfake cues can be easier to deploy on-prem and cheaper to run continuously. Quantization and distillation help, but the bigger advantage is narrow scope. Security teams don’t need a model that can write sonnets. They need one that can classify an event stream reliably and keep data inside the boundary.

There are limits. SLMs can struggle with long-context reasoning and unfamiliar edge cases. But for tightly bounded tasks, they’re often a better operational fit than shipping everything to a huge external model and hoping legal approves it.

Deepfake detection is moving into the auth stack

TruSources focuses on real-time deepfake detection across audio, video, and images. That can sound niche until you look at where fraud is headed.

KYC, remote identity verification, customer support escalation, executive impersonation, and high-risk transaction approval are all obvious targets. If someone can convincingly spoof a voice call or live video session, traditional identity checks start looking thin.

The technical requirements are tough. Detection has to happen fast, often under 200 ms in authentication flows, and it has to work on streamed media. That pushes teams toward GPU-efficient multimodal pipelines looking for spectral audio artifacts, frame inconsistencies, blinking anomalies, and compression fingerprints. Pairing detection with C2PA provenance signals could help, though provenance is only useful when content is signed upstream and preserved through the toolchain. Often it won’t be.

Deepfake detectors also face a basic problem: adversaries adapt. Any vendor promising permanent superiority is selling fantasy. The value is layered friction, continuous retraining, and putting these checks inside workflows where they can actually block fraud.

Zest and the next version of cloud vulnerability management

Zest tackles cloud vulnerability management, but the interesting part is the unified view across cloud posture, application risk, IaC, SBOMs, and runtime context.

That’s where this market needs to go. Security teams don’t need longer issue lists. They need correlation. A medium-severity vulnerability on an internal system with no attack path is one thing. A known exploitable flaw on an internet-facing service tied to an over-privileged role and a sensitive data store is another.

If Zest can rank risk using CVSS, exploit availability, asset exposure, business criticality, and attack path analysis, that’s useful. If it just repackages alerts from existing scanners into one prettier pane, it won’t matter.

What technical leaders should take from this

Three buying priorities are getting clearer.

First, validate AI systems with offensive testing. Don’t assume your guardrails work because a vendor demo looked clean. Test prompts, tool permissions, session boundaries, and retrieval paths continuously.

Second, treat AI use discovery as basic hygiene. Shadow AI is already in your environment whether you approved it or not.

Third, push security products to prove operational fit. Recovery time. Policy enforcement points. Latency under load. False positive rate. Integration into existing pipelines. Those details separate a useful product from startup theater.

The strongest companies in this group are focused on proof. Security buyers have heard enough promises. They want systems that survive contact with production.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI automation services

Design AI workflows with review, permissions, logging, and policy controls.

Related proof

Marketplace fraud detection

How risk scoring helped prioritize suspicious marketplace activity.

Google outlines a security model for Chrome’s upcoming agentic features

Google has started laying out how Chrome will control its upcoming agentic features. The notable part is the posture. Google is treating the model in the browser as a risky actor that needs supervision. That’s the right call. Once a browser agent can...

Anthropic previews Mythos, an AI model for finding zero-day vulnerabilities

Anthropic is previewing a new model called Mythos for a job with real stakes: finding software vulnerabilities before attackers do. The company says Mythos has already found thousands of zero-day vulnerabilities, including bugs in codebases that have...

OpenAI acqui-hires Roi founder as it expands consumer AI efforts

OpenAI has acqui-hired Sujith Vishwajith, the CEO and co-founder of Roi, a New York startup that built an AI personal finance app around user-specific context. Roi is shutting down on October 15. Only Vishwajith is joining OpenAI. Deal terms weren’t ...