Is Cognition saying Devin will replace software engineers?

No. Scott Wu says Cognition sees Devin as a tool for taking on software tasks, not as a replacement for programmers.

What level of work does Devin perform?

Wu describes Devin as performing somewhere between a junior and mid-level engineer, depending on the task.

Where are AI coding agents most useful right now?

They are most useful on structured maintenance work such as dependency upgrades, migrations, test generation, refactors, and fixing lint or deprecated API issues.

Generative ai May 31, 2026

Cognition’s Scott Wu draws a line between AI coding tasks and jobs

Cognition CEO Scott Wu is trying to hold a line every AI coding company now has to hold carefully: AI agents should take over software tasks, but not software jobs. That line became harder to defend after Cognition raised $1 billion at a reported $26...

Cognition says Devin won’t replace developers. Its own numbers make that harder to believe

Cognition CEO Scott Wu is trying to hold a line every AI coding company now has to hold carefully: AI agents should take over software tasks, but not software jobs.

That line became harder to defend after Cognition raised $1 billion at a reported $26 billion valuation. The two-year-old company makes Devin, one of the best-known AI coding agents, and its fundraising announcement talked about “self-driving software development.” That phrase lands differently in 2026, when tech executives keep pointing to AI while cutting headcount.

Wu told TechCrunch that replacing programmers has “never been our view.” He describes Devin as a “buddy” for developers, a system that helps engineers build more and shed maintenance work many of them dislike. Devin can work independently, he said, but generally performs “somewhere between a junior and a mid-level engineer,” depending on the task.

Then there’s Cognition’s own metric. The company says 89% of code committed by its engineers is committed by Devin, with the rest coming from local agents in Windsurf, the AI coding company Cognition acquired last year. That is an extraordinary internal number, even with a long list of caveats. A commit stat doesn’t show who wrote the spec, reviewed the change, debugged the failure, designed the architecture, or took responsibility when production broke.

Still, technical leaders will notice.

Coding agents have moved past autocomplete

Devin is part of the shift from code completion to task execution. Earlier AI coding assistants, including GitHub Copilot-style tools, mostly lived inside the editor. They suggested functions, filled in boilerplate, generated tests, or explained snippets. The developer stayed close to every step.

Agentic coding tools take on larger chunks of work: clone a repo, inspect an issue, plan changes, edit files, run tests, read failures, patch bugs, and open a pull request. That’s why Wu says Devin “naturally owns tasks end to end.”

For senior engineers, the distinction matters. A coding agent’s value no longer comes from generating syntactically valid Python or TypeScript. That’s table stakes. The harder work sits at the boundary between code, tooling, dependencies, CI, ticket descriptions, test suites, build systems, and human intent.

A useful agent has to:

maintain context across a codebase, not just one file
understand project conventions without being spoon-fed every rule
run commands and interpret failures
avoid destructive changes in shared environments
produce reviewable diffs rather than sprawling rewrites
know when it’s stuck

That last point is easy to underrate. The most expensive AI coding failures often aren’t bad suggestions. They’re confident half-solutions that waste engineer time in review.

“Junior to mid-level” sounds plausible, with limits

Wu’s “junior to mid-level engineer” framing is one of the more believable claims in AI coding because it includes the qualifier “depending on the task.”

Long-tail maintenance work is where agents should improve first. Dependency upgrades. Framework migrations. Test generation. API client refactors. Moving old services to newer platforms. Cleaning up lint failures. Replacing deprecated calls. These tasks often have clear patterns, measurable success criteria, and plenty of examples inside the repo.

They’re also boring enough that teams postpone them until the work becomes risky.

For a tech lead, that’s the practical pitch. If an agent can upgrade a service from an old SDK version, run the test suite, patch obvious failures, and produce a pull request a human can review in 20 minutes, it’s useful. No science fiction required. Just decent repository understanding, tool execution, and guardrails.

But junior-to-mid-level performance doesn’t map cleanly to headcount. A mid-level engineer doesn’t only complete tickets. They ask whether the ticket should exist, spot weak requirements, challenge fragile designs, and understand the social map of a codebase. They know which test failure is a flaky fixture and which one points to a production bug. They know which “small migration” will break a downstream team’s workflow.

Agents are getting better at code mechanics. They’re still weak at organizational context.

The 89% commit figure deserves skepticism

Cognition’s claim that Devin commits 89% of its engineers’ code is the kind of stat that will show up in board decks fast. It needs scrutiny before anyone builds a staffing model around it.

Commits are a noisy proxy. A tool can generate most of the lines or commits while humans still do the highest-value work: choosing the task, setting constraints, reviewing the result, staging rollout, and taking accountability. In some workflows, an agent may be the committer of record even when a human closely directed the change.

There’s also selection bias. Cognition builds the agent, hires people who want to work this way, and likely structures internal work around Devin’s strengths. Its repo practices, test infrastructure, and review culture probably evolved around agent execution. That doesn’t mean a bank, healthcare vendor, or 15-year-old enterprise SaaS product can reproduce the same number next quarter.

Strong engineering environments make agents look much better. Clean test coverage, fast CI, clear module boundaries, typed APIs, reproducible dev containers, and high-quality tickets all improve reliability. A tangled monolith with tribal knowledge and slow integration tests will expose the limits quickly.

That’s deployment reality, not a knock on Cognition.

Self-driving software still needs supervision

“Self-driving software development” is a catchy phrase, and a risky one if people take it literally.

Autonomous coding agents can already handle bounded work in controlled environments. Full software development also includes product judgment, security posture, regulatory requirements, customer empathy, incident response, and long-term maintenance costs. A system can generate code that passes tests and still produce a bad product decision.

The self-driving car analogy cuts both ways. Autonomy looks impressive in demos and constrained routes. Edge cases define the product. In software, those edge cases include underspecified requirements, hidden coupling, undocumented dependencies, flaky CI, secrets handling, license contamination, data privacy rules, and production behavior no test suite covers.

Security is a particular pressure point. Giving an agent access to repositories, package managers, terminals, cloud resources, and issue trackers expands the attack surface. Prompt injection against developer tooling is no longer a toy concern when agents read tickets, docs, webpages, logs, and dependency metadata. A malicious instruction buried in an issue comment or README can try to steer an agent into exfiltrating secrets or weakening checks.

Teams adopting agents need to treat them as semi-trusted automation. That means scoped credentials, sandboxed execution, audit logs, branch protections, mandatory review for sensitive paths, and policy checks for dependencies and infrastructure changes.

The performance question also isn’t settled. Agent workflows can burn a lot of tokens and compute while iterating through failures. If the agent spends 40 minutes and many model calls to produce a mediocre diff, the economics look different from a clean demo. Latency matters when engineers are waiting. Cost matters when every team starts spawning agents for routine tickets.

Why Wu’s developer-first framing matters

Wu has credibility with programmers partly because he is one. He started coding as a child, built a reputation as an elite competitive programmer, and came out of a generation of math and programming prodigies that also produced founders like Scale AI’s Alexandr Wang. His stated concern that AI shouldn’t take the joy out of programming doesn’t sound pasted on after a PR review.

It also serves Cognition commercially. Developers are the buyers, users, and skeptics of these tools. A company selling an AI coding agent can’t afford to sound like it’s cheering for the removal of engineers, even if CFOs hear something different in the pitch.

Wu frames agents as a new abstraction layer, similar in spirit to higher-level languages and visual development tools that moved programmers away from machine-level details. The comparison is useful, within reason. Abstractions increase productivity by hiding lower-level complexity, but they also create new failure modes. Engineers still need to understand what sits underneath when performance collapses, memory leaks, security boundaries fail, or generated code quietly violates an invariant.

A developer who uses Devin well may end up working at a higher level of intent: defining tasks, reviewing plans, setting constraints, reading diffs, and validating behavior. That’s still engineering, but the center of gravity shifts. Less typing. More supervision. More taste.

Some people will enjoy that. Others got into software because they like touching the code directly.

What technical leaders should take from this

The practical question isn’t whether AI agents “replace developers.” That wording is too blunt to guide decisions. A better question is which parts of your engineering process can tolerate probabilistic execution with human review.

Start with tasks that have tight feedback loops and low blast radius. Dependency updates in non-critical services. Test backfills. Documentation fixes. Mechanical refactors. Internal tools. Migration prep. Let agents open pull requests, then measure review time, defect rates, rework, CI cycles, and production incidents. Track whether senior engineers are actually freed up or just spending their days babysitting noisy diffs.

Teams should also prepare their codebases for agent work. The same investments that help humans help agents: clearer interfaces, stronger tests, typed contracts, smaller modules, reproducible environments, and better issue descriptions. Messy repos tax everyone. Agents just make the cost more visible.

The staffing implications are uncomfortable. If agents reliably absorb junior-level tasks, companies may hire fewer junior engineers. That creates a pipeline problem because senior engineers don’t appear fully formed. The industry already struggles with entry-level hiring. AI coding agents could make that worse if companies treat junior work as waste instead of training ground.

Cognition’s position is that agents remove toil and leave humans with the creative work. That optimistic version is plausible in strong teams with disciplined review. The harsher version is that companies use agent productivity to squeeze teams, reduce apprenticeship, and push more accountability onto fewer senior people.

Both outcomes are possible. Management choices will decide which one dominates.

Wu says humans should remain in control of what agents do, in software and in other fields such as customer service and medicine. That’s the right principle. It also needs enforcement through product design, permissions, auditability, and organizational norms. A human “in the loop” who rubber-stamps 30 agent-generated pull requests before lunch isn’t meaningful oversight.

Devin’s progress is real enough to take seriously. So is the gap between committing code and owning software. The next phase of AI coding will be judged in code review, CI logs, incident reports, and the quiet question every engineering manager asks after a quarter of usage: did this make the team better, or just busier in a new way?

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Expert staff augmentation

Add focused AI, data, backend, and product engineering capacity when the roadmap is clear.

Related proof

Embedded AI engineering team extension

How an embedded engineering pod helped ship a delayed automation roadmap.

Datadog veterans launch Niteshift to challenge AI coding lock-in

Niteshift, a new AI coding agent startup founded by two early Datadog engineers, has raised a $7 million seed round led by Greylock’s Jerry Chen. The round is modest next to the giant funding rounds now attached to AI coding companies, but the invest...

Goldman Sachs Tests Devin AI Coding Agent Across Its Engineering Teams

Goldman Sachs is testing Cognition’s AI coding agent Devin inside the bank, and the way it’s talking about the rollout is unusually direct. CIO Marco Argenti told CNBC the firm plans to deploy hundreds of Devin instances alongside its 12,000 human de...

AI coding tools save time until senior engineers clean up the code

AI coding tools save time until they hand you the cleanup. Senior engineers are doing a lot of that cleanup now. They review shaky diffs, strip out duplicated logic, catch fake dependencies, and fix auth mistakes that look fine in a demo and bad in p...