What is a stacked pull request?

A stacked pull request is a series of small, dependent pull requests chained together to simplify the review of large code changes.

How does Cursor's Bugbot integrate with Graphite?

Bugbot runs AI reviews on each PR in the stack, offering context-aware suggestions and automated fixes for clearer feedback.

Why did Cursor acquire Graphite?

Cursor acquired Graphite to strengthen its software delivery platform with dependency-aware PR workflows and enhanced AI-assisted code review.

Artificial Intelligence December 21, 2025

Cursor acquires Graphite, adding stacked pull requests and AI code review

Cursor has acquired Graphite, the startup best known for stacked pull requests and AI-assisted code review. Axios reports Cursor paid well above Graphite’s last private valuation of $290 million. For a company reportedly valued at $29 billion in Nove...

Cursor buys Graphite and moves deeper into the software delivery pipeline

Cursor already has the code generation side covered. Graphite gives it a stronger position in everything that happens after code is written: splitting changes up, reviewing them, rebasing, testing, and merging. That’s a good place to spend money. AI coding tools can produce a lot of code quickly. Teams still struggle to land large changes safely.

Graphite is built for that gap.

Why Graphite matters

Graphite’s core idea is the stacked PR. Instead of one oversized pull request mixing schema changes, API updates, refactors, test fixes, and feature logic, you break the work into a sequence of smaller PRs, each depending on the one before it.

So instead of:

one 2,000-line PR nobody wants to review

you get:

PR A: add a new database field
PR B: backfill data
PR C: update service logic
PR D: switch reads to the new field
PR E: remove old code

The idea is simple. The painful part is keeping that stack in shape while the underlying code keeps moving. Graphite built around that problem: dependency-aware PRs, automatic rebasing, status propagation, and review flows that treat a chain of changes as a connected unit instead of a mess of Git branches.

That’s useful on its own for backend-heavy teams, monorepos, migrations, and multi-step refactors. Add AI review and the value goes up. Models do better on a tight, well-scoped diff than on a giant PR full of unrelated changes.

That’s the logic behind the deal.

Cursor is building out an SDLC stack

This also matches Cursor’s recent pattern. It bought Growth by Design in November and picked up talent from Koala in July. The company is clearly moving past the AI editor category and toward a broader software delivery platform.

The market is crowded, and raw code generation is already starting to look interchangeable. If every serious coding assistant can autocomplete, refactor, and spit out boilerplate, the next fight is over workflow control.

Who owns the path from first draft to merged code?

That’s where the value sits. Review, policy checks, CI integration, test generation, merge orchestration, enterprise controls, auditability. The boring parts. Also the parts that decide whether a team actually ships faster or just creates more cleanup work.

Cursor already has Bugbot for AI review. Graphite gives that system a better structure to work with. Reviewing a stack of small, dependency-aware PRs gives the model cleaner context, clearer intent, and a better chance of finding real issues.

That’s much stronger than dropping an LLM into a standard GitHub PR and hoping it says something useful.

What the combined product likely looks like

The obvious integration path is straightforward:

Write or refactor code in Cursor.
Split the work into a stack of small PRs in Graphite.
Run Bugbot review on each PR in the stack.
Apply suggested fixes directly into the stack.
Use CI and policy signals as merge gates.

If Cursor executes well, that becomes a solid loop for teams that already live in GitHub and care about review quality. Write code in the editor, preserve intent as a sequence of changes, and let the review system reason about each step with less noise.

The hard part is under the hood. Good AI review in this setup probably needs a few systems working together.

Repository-aware context

A decent reviewer can’t work off raw diff text alone. It needs repository indexing, import graphs, call chains, symbol lookup, ownership metadata, maybe even historical PR context. Without that, it will miss the same bugs generic review bots keep missing: side effects, policy violations, and changes that look harmless locally but break assumptions somewhere else.

In large repos, that context has to be incremental. Re-indexing a monorepo on every push is expensive and annoying.

Semantic diffing

Line diffs are a poor abstraction for review automation. AST-level comparison is a lot more useful, especially when a refactor changes structure without changing behavior, or when a tiny text edit hides a meaningful logic change.

A model that can see “condition changed from || to && inside auth middleware” is working with a much better signal than one staring at a few added and deleted lines in a large file.

Policy-aware checks

This is where enterprise buyers start paying attention. Teams don’t want a bot that mostly suggests variable renames. They want checks tied to actual engineering rules:

no outbound network calls in auth code
migrations need a backfill path
type checks must pass in specific services
hot-path changes trigger performance scrutiny
secrets or keys never enter the repo

An AI reviewer is useful here if it can sit on top of tools like eslint, pylint, semgrep, type checkers, and security scanners, summarize findings in context, and propose fixes without inventing policy.

CI as a first-class signal

A failing test shouldn’t just show up as a red X. It should feed the review process. Which part of the stack likely caused it? Is this flaky infrastructure or a real regression? Did coverage drop across the stack? Did a lower PR introduce the issue and only expose it higher up?

Stacked PR workflows help because they give the system a dependency graph it can reason about.

Why this fits AI, and where it still breaks

AI review has a real shot on small PRs. Smaller diffs reduce ambiguity. Intent is easier to infer. Failure modes are easier to isolate. Suggested fixes are less risky. Review latency drops when humans are looking at 100 lines instead of 1,500.

It’s still not a solved problem.

LLM review quality drops fast when the context gets noisy, the codebase is highly domain-specific, or the policies that matter only exist in people’s heads. A model can be good at spotting generic bugs and still be mediocre at understanding why a particular migration is dangerous in your environment. It can also produce comments that sound smart and waste everyone’s time.

Stacked PRs help. They don’t remove that risk.

There’s also a limit to how much workflow complexity teams will accept. Stacks are great for disciplined engineering orgs. They can feel like overhead for smaller teams making straightforward changes. If the tooling gets fiddly, people will work around it. Git habits are hard to change.

Cursor now has to do both jobs well: keep the workflow powerful enough for serious teams and light enough that it doesn’t feel like process theater.

The market around this is tightening up

This deal lands in a busy part of the market. CodeRabbit raised in September at a reported $550 million valuation. Greptile announced a $25 million Series A this fall. Everyone is chasing some version of the same idea: AI should sit in code review and cut the time between writing and shipping.

Cursor has a better position than most because it spans authoring and review. That matters. It can potentially connect intent from the editor to the PR layer instead of reconstructing everything after the fact from a diff. In theory, that should lead to better review comments, better test suggestions, and better automated fixes.

Enterprise buyers will ask the usual questions anyway. Where is repository context stored? Is inference cloud-based or self-hosted? How much code gets indexed? Are review artifacts auditable? Can findings be exported through SARIF and fed into existing compliance tooling? Regulated teams will care a lot more about that than about how polished the chatbot feels.

What engineering leaders should watch

If you’re evaluating this category, demos are the least interesting part. Watch whether the product improves software delivery without adding friction.

A useful pilot probably looks like this:

work naturally breaks into 3 to 6 PRs
each PR can be reviewed in under 15 minutes
the repo already has some policy discipline
CI is reliable enough to act as a gate
teams are making changes where rollback and sequencing matter

Good candidates include schema migrations, large refactors, feature-flag rollouts, API version transitions, and infrastructure changes with dependent application updates.

And watch the metrics that actually matter: review turnaround time, rework after review, post-merge fixes, change failure rate. If the tool writes a lot of comments but doesn’t improve those numbers, it’s just adding noise.

Cursor buying Graphite makes sense for a simple reason. The weak point in AI coding has never been generating code. It’s getting that code into production without creating a mess. Models can draft all day. Safe delivery is still where teams pay for mistakes.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI engineering team extension

Add engineers who can turn coding assistants and agentic dev tools into safer delivery workflows.

Related proof

Embedded AI engineering team extension

How an embedded pod helped ship a delayed automation roadmap.

Cursor launches a browser control plane for AI coding agents

Cursor has launched a web app for managing its background coding agents, extending them beyond the IDE and Slack into a browser control plane. You can assign a task in natural language, watch the agent work, inspect the diff, and merge the result int...

LiveKit raises $100M at a $1B valuation as voice AI infrastructure demand grows

LiveKit has raised $100 million in Series C funding at a $1 billion valuation, with Index Ventures leading and Altimeter, Hanabi Capital, and Redpoint participating. The round comes just 10 months after its previous raise. The valuation matters, but ...

Bluesky introduces Attie for natural-language custom feeds on atproto

Bluesky is putting AI to work in a place that actually fits the product. The company has introduced Attie, a standalone assistant for building custom feeds on the atproto ecosystem using natural language. You describe the feed you want. Attie turns t...