Harness raises $240M to automate the software delivery gap after AI code generation
AI can write code faster than most teams can safely ship it. That gap costs real money. Harness has raised $240 million in a Series E at a $5.5 billion valuation, with $200 million in primary capital led by Goldman Sachs and a planned $40 million ten...
Harness raises $240M at a $5.5B valuation to automate the part of software delivery AI still struggles with
AI can write code faster than most teams can safely ship it. That gap costs real money.
Harness has raised $240 million in a Series E at a $5.5 billion valuation, with $200 million in primary capital led by Goldman Sachs and a planned $40 million tender offer for long-tenured employees. The company says it’s on track to pass $250 million in ARR in 2025. For a business built around delivery plumbing, that’s a big number.
The pitch is simple. Code generation is improving fast, but the work after the pull request still eats most of the time. Testing, policy checks, deployment orchestration, approvals, rollback logic, audit trails. All the parts that slow a release down. Harness says nearly 70% of software delivery time sits there.
That sounds believable if you’ve watched a change sit in CI for hours while flaky tests rerun, security scans crawl, and somebody from compliance has to approve a button click.
Where Harness is aiming
Harness isn’t trying to compete with GitHub Copilot on code completion. It wants the control layer around software delivery, where automation is still fragmented and where AI may actually have enough context to make useful decisions.
Its core idea is a software delivery knowledge graph. The term is a little grand, but the underlying model is practical. Instead of asking a general model to reason from tickets, YAML, and loose prompts, Harness builds a structured map of how a company ships software:
- code changes
- services
- tests
- environments
- deployment pipelines
- incidents
- policies
- artifacts
- cost centers
The value is in the links between them. Which PR touched which service. Which tests cover those paths. Which policies apply to that service tier. Which incidents tie back to a rollout. Which APIs changed. Which environments fall under SOX controls.
That’s a stronger base for delivery automation than dropping a chatbot into Jenkins.
A graph gives an agent enough context to do narrower, higher-value work. It can pick the tests that matter for a change instead of running the whole suite. It can assemble policy gates based on service sensitivity. It can choose canary, blue-green, or a hard stop for human approval.
That direction makes sense. AI coding tools increase the amount of code entering the system. They do very little to reduce the operational and governance load that comes after.
The product idea matters more than the funding
Harness says its agents can generate and orchestrate pipelines tailored to a customer’s environment, then keep humans in the approval path where needed. That’s the sane design choice. Fully autonomous production changes sound great until one goes sideways.
A plausible flow looks like this:
- A PR lands.
- The agent runs
change impact analysis. - It selects relevant tests instead of the entire suite.
- It adds policy checks based on service type and environment.
- It prepares rollout steps, maybe a Kubernetes canary or staged data migration.
- It runs those steps across tools like
kubectl,Helm,Argo CD, andTerraform. - It gates promotion based on telemetry, error budgets, latency, and failure rates.
- It requests approvals for regulated or high-risk changes.
That matters because the hard part of delivery usually isn’t running one tool. It’s coordinating a pile of tools under real constraints. CI has long handled “run this script.” Modern delivery needs “run the right sequence, with enough context to know when to stop.”
Harness is also pointing at workflows platform teams already know well. Policy-as-code through OPA and Rego. Artifact signing. SBOM generation in SPDX or CycloneDX. SLSA provenance. Progressive delivery with observability-driven rollback. Human signoff for material changes. The promise is less glue code and less waiting around.
If it works, that’s valuable. Plenty of engineering time disappears into orchestration overhead nobody actually likes.
Why the Traceable merger matters
Earlier in 2025, Jyoti Bansal merged API security company Traceable into Harness. That says something about where the company thinks this market is going. Delivery and application security are moving toward the same control plane.
That tracks.
An API change is a release event, but it can also be a security event, a compliance event, and sometimes the start of a customer-facing incident. If a new endpoint goes live, the delivery system should know whether it has schema validation, rate limiting, the right WAF coverage, and the expected auth policies. If attack traffic jumps right after a deployment, the system should connect that back to the rollout quickly.
Most teams still split those responsibilities across too many tools and too many owners. DevOps ships. AppSec scans. SRE watches error budgets. Compliance arrives later asking for evidence. It’s a clumsy setup, and AI summaries on top don’t fix it.
A shared data model might.
That’s why this looks more substantial than another AI feature announcement inside a DevOps suite. Harness is trying to tie release orchestration, security signals, governance, and cloud cost controls to the same graph. It’s ambitious, but the logic is sound.
The market is crowded, but the bottleneck moved
Harness is up against GitHub, GitLab, Jenkins, and CloudBees, among others. All of them are pushing AI deeper into delivery workflows.
GitLab Duo already generates pipeline scaffolding and security workflows. GitHub is threading policy and security deeper into Actions. Jenkins is still everywhere, especially in large enterprises, but standardization takes work and plugin sprawl remains a tax. CloudBees stays focused on release orchestration and enterprise controls.
Harness’s argument is that the problem now extends past CI/CD, and that seems right. The old category lines matter less than they used to. Delivery systems are becoming decision systems. Once AI-generated code increases throughput, every weak point downstream gets easier to see.
Cost is part of that. Running a full integration suite on every change is expensive. Over-scanning low-risk services is expensive. Shipping a bad rollout costs more. A system that can narrow test scope, apply the right controls, and stop a deployment early based on telemetry could save money in ways developers and finance teams both notice.
Harness says it already has some scale: more than 1,000 enterprise customers, 128 million deployments, 81 million builds, 1.2 trillion API calls protected, and $1.9 billion in cloud spend optimized over the past year. Those are vendor numbers, so treat them accordingly, but they do suggest the company is operating well past the demo stage.
Where this gets hard
The story is easy to tell. The implementation is where this either pays off or turns into a very expensive abstraction layer.
For the model to work, the metadata has to be good. A knowledge graph is only as useful as the systems feeding it. If service ownership is stale, test coverage mappings are weak, policies are inconsistent, or observability tags are a mess, the agent will make bad calls and sound confident doing it.
Then there’s safety. Any AI agent touching production needs hard operational limits:
- dry runs before apply
- immutable artifacts
- scoped
RBAC - time-boxed canaries
- automatic rollback
- durable audit logs
- signed artifacts and provenance attestations
Without those controls, agentic delivery just introduces fresh failure modes. A bad code suggestion is annoying. A bad production rollout becomes a 3 a.m. incident, and sometimes a disclosure problem.
Auditability matters just as much. If an AI system changes deployment behavior or policy enforcement, teams need a clean record of who approved what, what the model recommended, what actually ran, and which evidence was attached. That points straight at tooling like Sigstore, cosign, Fulcio, and Rekor, even if those details don’t make the press release.
The last issue is lock-in. The more one vendor becomes your delivery brain, the harder it is to pull them out later. That’s why portable policy definitions, Git-stored controls, open telemetry standards, and standard artifact formats matter. Teams should worry about that early, not after the renewal quote lands.
What engineering leaders should do with this
If you run platform engineering, DevSecOps, or a large application estate, the takeaway is pretty direct. Code output is going up. The pressure shifts to verification, policy, rollout safety, and evidence collection. That’s where teams are losing time now.
The sensible way to test a platform like this is narrow and measurable:
- pick a high-change service with slow or flaky tests
- try change-based test selection
- wire in progressive delivery on one production path
- express approval rules in
OPA - attach telemetry gates using
OpenTelemetry - sign artifacts and generate
SBOMs as a baseline
Then watch the metrics that matter: merge-to-prod time, rollback frequency, compute spend, incident rate, audit prep effort.
If those move in the right direction, the platform is earning its place. If all you get is cleaner dashboards and slower procurement, skip it.
Harness still has plenty to prove. But the thesis holds up. AI has already sped up code creation. The next fight is over everything that happens once that code hits the delivery system, and that fight sits much closer to production.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Add focused AI, data, backend, and product engineering capacity when the roadmap is clear.
How an embedded engineering pod helped ship a delayed automation roadmap.
Modelence has raised a $3 million seed round led by Y Combinator, with Rebel Fund, Acacia Venture Capital Partners, Formosa VC, and Vocal Ventures also participating. The pitch is clear enough: AI can generate components, endpoints, and decent-lookin...
Greptile, a startup building AI-assisted code review, is reportedly raising a $30 million Series A led by Benchmark at a $180 million valuation. For a company founded in 2023, that’s fast. It also points to a specific shift in the market. AI coding c...
Shuttle has a straightforward pitch: AI can generate a working app quickly, but production is still where things slow down. The startup just raised a $6 million seed round to handle that handoff, turning generated code into actual cloud infrastructur...