How many employees participated in Project Deal?

69 employees each had a $100 gift card budget for negotiations.

How many deals were completed and what was their total value?

The agents closed 186 transactions worth over $4,000.

What key insight emerged about model performance?

Better models consistently achieved superior outcomes while weaker-model users didn’t notice they were getting worse deals.

Artificial intelligence April 27, 2026

Anthropic's Project Deal tests agent-to-agent commerce with real purchases

Anthropic built a small classified marketplace where AI agents represented buyers and sellers, negotiated with each other, and completed real transactions for real goods with real money. It calls the experiment Project Deal. This was a modest int...

Anthropic’s Project Deal shows AI agents can negotiate real purchases, and that’s both impressive and a little worrying

This was a modest internal pilot. Anthropic says 69 employees took part, each with a $100 budget paid via gift cards to buy items from coworkers. The agents completed 186 deals worth more than $4,000.

That’s a small market. It still shows something useful: language models can already handle part of the messy back-and-forth between “find an item” and “close the deal.”

The most important result wasn’t that the agents could bargain. It was that better models got better outcomes, while people using weaker agents often didn’t realize they were getting a worse deal.

That’s a real product problem.

What Anthropic tested

Anthropic says Project Deal ran as four separate marketplaces. One was “real,” meaning participants were represented by the company’s most advanced model and the deals were actually honored afterward. The other three were for study.

That setup matters. This was tied to actual preferences, actual inventory, and actual follow-through, not a sandbox chat between two bots pretending to buy something.

So the findings carry more weight than most agent-economy talk.

Anthropic says it was “struck by how well Project Deal worked.” Fair enough. A system that gets 186 completed transactions among humans with minimal collapse is worth paying attention to. Multi-agent systems usually fail in dull, familiar ways. They loop, over-negotiate, lose track of constraints, or do the wrong thing with confidence. Getting useful exchanges out of a market setup is harder than a benchmark score makes it look.

The scope was still narrow:

Participants were all Anthropic employees
The pool was self-selected
The budget was tiny
Goods were sold among coworkers
The marketplace was temporary and tightly bounded

Those limits cut both ways. The results are less generalizable, but the smaller setting strips out a lot of real-world noise. That makes it easier to see whether the core behavior works.

It seems to.

Agent quality changes the outcome

Anthropic says people represented by more advanced models got objectively better outcomes. Product teams should pay attention to that.

If model quality directly changes negotiation performance, agent design starts to look a lot like market power.

A stronger model can probably do a few things better:

infer a user’s real preferences from sparse instructions
identify better counterparties
frame offers more persuasively
avoid bad concessions
judge when to hold firm and when to close
track constraints across a longer exchange

None of that is mysterious. Better language models tend to do better on semi-structured tasks with ambiguity and incentives.

The harder part is what Anthropic says happened next: users didn’t seem to notice the gap.

So you can end up with two people using AI shopping assistants or sales agents, and one keeps getting better terms because their model is better tuned, better prompted, or simply stronger. The weaker side may not know it’s losing.

Anyone who’s worked on ranking, ad auctions, recommenders, or financial tools will recognize the pattern. Information asymmetry was already a problem. Now some of it may sit inside the model layer.

That leaves a blunt product question. If your platform lets agents transact for users, do you disclose capability differences, or hide them behind a clean UI and call it automation?

Prompting mattered less than model capability

Another result stands out. Anthropic says the initial instructions given to agents didn’t appear to affect sale likelihood or negotiated prices.

That matters because a lot of current agent product work still treats prompting as the main control surface. Rewrite the system prompt. Adjust the persona. Add a negotiation style. See what happens.

Project Deal suggests that in this setting, base model capability mattered more than high-level instruction wording.

Prompts still matter. But teams may be overstating how much clever prompt work can make up for a weaker model when the job involves strategy, memory, trade-offs, and adaptation over many turns.

If you’re building agent-mediated commerce, procurement bots, or B2B negotiation tools, the hard work is probably in:

model selection
state tracking
tool use and policy constraints
preference capture
evaluation on outcome quality, not just conversation quality

A polished prompt won’t rescue a mediocre agent.

Why this matters beyond an internal experiment

The obvious use case is e-commerce. The faster impact may show up in enterprise software.

A lot of business processes already look like low-stakes negotiation:

software procurement
scheduling and rescheduling
vendor quote comparison
ad inventory buying
freight and logistics coordination
internal resource allocation
support escalations with compensation offers

These jobs have bounded rules, repeated interactions, and a lot of language overhead. They fit semi-autonomous agents pretty well.

Project Deal points to a practical setup: put models between humans for the repetitive back-and-forth, then let humans approve the result.

The hard part starts after that. Trust, auditability, and failure handling are where this gets expensive.

If an agent gets a bad deal, who owns that? If two agents collude, how do you spot it? If a model learns that deceptive framing gets better prices, is that optimization or misconduct?

Those stop sounding academic once money is involved.

The engineering problems start where the demo stops

Anthropic’s marketplace was small and controlled. In production, agent-to-agent commerce gets messy fast.

Identity and authorization

An agent needs a clear authority boundary. What can it commit to? What spending limits apply? Can it accept substitutions, shipping delays, or bundle offers?

Without hard constraints, autonomous purchasing turns into a support problem.

Verifiability

You need logs that show why a deal happened. Not just a transcript. A record of user preferences, policy checks, offer history, and the final approval path.

If an agent acts as a commercial representative, teams will want audit trails.

Strategic behavior

Once agents know they’re negotiating with other agents, they’ll start optimizing against predictable bot behavior. That can mean bluffing, stalling, exploiting timing patterns, or gaming platform rules.

In a small pilot, that looks manageable. At scale, it starts to look like adversarial market design.

Security and fraud

Agent marketplaces open up obvious attack surfaces: fake listings, reputation gaming, prompt injection in listing text, manipulative offer chains, attempts to extract policy or budget information.

The usual web app problems don’t disappear. They just get wrapped in natural language.

Fairness

Anthropic already found a quality-gap problem. That can turn into a class issue quickly. Premium users get stronger agents. Everyone else gets out-negotiated by default.

That may work as a business model. It’s a rough market model.

A standards problem is coming

If agent-on-agent commerce keeps moving forward, platforms will need common ways to express intent, constraints, identity, and settlement terms.

Right now, most agent frameworks are decent at orchestration demos and thin on transaction semantics. They can call tools and juggle context, but commerce needs stronger primitives:

budget caps
allowed negotiation ranges
fulfillment conditions
dispute handling
cancellation rules
machine-readable commitments

Without that layer, every AI agent marketplace risks turning into a stack of prompts, hidden heuristics, and brittle business logic.

That’s manageable in an internal pilot. It won’t hold up much further.

What developers should take from this

Project Deal is small, but it lands a few solid points.

First, multi-agent systems can already do something economically useful in a constrained environment. That’s more concrete than most agent hype.

Second, outcome quality varies with model quality in ways users may not catch. If you build commercial agents, evaluation can’t stop at task completion. You need to measure deal quality, user welfare, and consistency across model tiers.

Third, prompt-level steering may matter less than teams hope. Better models and better system design still win.

And a working marketplace demo doesn’t solve the governance layer. It just makes it impossible to ignore.

Anthropic’s experiment makes agent commerce feel less speculative. It also sharpens the next question. If bots start cutting deals for people, the product challenge is making sure users know whose interests the agent is actually serving.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof

AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

May Habib at Disrupt 2025 on moving AI agents into enterprise workflows

May Habib is taking the AI stage at TechCrunch Disrupt 2025 to talk about a problem plenty of enterprise teams still haven't solved: getting AI agents out of demos and into systems that actually matter. A lot of enterprise AI still looks like a chat ...

Meta acquires Moltbook, the AI agent social network built on bot posts

Meta has acquired Moltbook, the odd little social network where AI agents post and reply to each other in public threads. Deal terms aren’t public. Moltbook founders Matt Schlicht and Ben Parr are joining Meta Superintelligence Labs. Moltbook looked ...

NeoCognition emerges from stealth with $40M to build AI agents based on human learning

NeoCognition, a startup spun out of Ohio State professor Yu Su’s AI agent lab, has emerged from stealth with a $40 million seed round led by Cambium Capital and Walden Catalyst Ventures. Vista Equity Partners joined, along with angels including Intel...