Artificial Intelligence July 7, 2025

May Habib at Disrupt 2025 on moving AI agents into enterprise workflows

May Habib is taking the AI stage at TechCrunch Disrupt 2025 to talk about a problem plenty of enterprise teams still haven't solved: getting AI agents out of demos and into systems that actually matter. A lot of enterprise AI still looks like a chat ...

May Habib at Disrupt 2025 on moving AI agents into enterprise workflows

Writer wants enterprise AI agents to live inside real workflows. That’s the hard part

May Habib is taking the AI stage at TechCrunch Disrupt 2025 to talk about a problem plenty of enterprise teams still haven't solved: getting AI agents out of demos and into systems that actually matter.

A lot of enterprise AI still looks like a chat box sitting on top of a knowledge base, with a few API calls attached. Writer is selling something closer to operations software. The company says its Palmyra model family is built for regulated environments, and it's already working with customers including Uber, Intuit, and Vanguard on agents that handle content generation, compliance checks, and data extraction inside production workflows.

Investors are paying attention. Writer recently raised $326 million at a $1.9 billion valuation. For engineers, the better question is whether the technical stack holds up under that enterprise pitch, and where it still looks rough.

Why it matters

Companies want AI systems that can draft regulated reports, screen documents, route approvals, summarize case files, and extract structured data under latency, audit, and security constraints. That's a different job from chatbot UX.

Habib's talk matters because Writer sits in one of the busiest parts of the market. OpenAI, Anthropic, Google, Cohere, AWS, and plenty of smaller vendors are all chasing the same budget. Writer's argument is that enterprise adoption stalls on reliability and governance as much as model quality. That's a sensible read.

Large companies rarely get stuck because they can't call an API. They get stuck when legal, security, compliance, and IT show up.

Palmyra looks familiar, with enterprise packaging on top

From the details available, Palmyra follows a standard modern model playbook, tuned for enterprise sales.

Writer says Palmyra is trained on 1.5 trillion tokens from public data plus anonymized enterprise corpora, using a Mixture-of-Experts architecture. That fits the use case. MoE models can keep inference costs and performance in a reasonable range by activating only a subset of expert networks per request. If you're promising domain-specific behavior across finance, healthcare, and commerce, it's an obvious design choice.

Then comes the adaptation layer: parameter-efficient fine-tuning with methods like LoRA, plus RLHF guided by internal subject matter experts. Again, sensible. Full retraining is expensive and usually unnecessary when customers need domain adaptation, terminology control, or policy alignment. PEFT gets most of the benefit without slowing deployment to a crawl.

None of this is especially exotic. Enterprise infrastructure usually rewards repeatable choices over novelty.

Writer is trying to stand out on transparency. It says Palmyra supports explainable inference logs, bias audits, attention-weight logging, token probabilities, and structured reasoning traces for auditability.

Some of that is genuinely useful. Some of it needs hard questions.

Token probabilities and detailed trace logs can help with debugging, failure analysis, and compliance review. They can also create privacy and security problems if you're storing too much sensitive prompt context. The reasoning-trace piece is especially messy. Over the past year, the industry has backed away from treating raw chain-of-thought style traces as stable evidence. They're noisy, they can leak implementation details, and they may surface sensitive data. If Writer is offering reasoning transparency, buyers should ask what gets logged, who can see it, how long it's retained, and whether it's generated for audit presentation rather than exposed directly from the model.

That difference matters.

Orchestration is the real work

The most grounded part of Writer's stack is the agent orchestration layer.

According to the source material, Writer breaks capabilities into services such as DocumentSummarizer, ComplianceChecker, and DataEnricher, each running as a containerized microservice, with a central orchestrator routing requests based on task metadata. That's much closer to what production teams need than the giant all-purpose agent demos still making the rounds.

Specialized services make life easier for a few basic reasons:

  • You can scale expensive tasks independently
  • You can put tighter controls around sensitive functions like compliance checks
  • Observability is cleaner by component
  • Failures are easier to isolate

If the summarizer has a latency spike, it shouldn't take down retrieval or policy evaluation. If the compliance service needs a stricter model version, you shouldn't have to redeploy the whole stack.

The Kubernetes deployment snippet in the source material is ordinary, and that's the point: fixed replicas, resource limits, environment variables pointing to a domain model. Enterprise AI at scale usually looks like standard distributed systems engineering with a probabilistic model in the middle.

That's also where a lot of AI product talk goes off the rails. Teams like arguing about model benchmarks. The work that decides whether a system survives procurement is much duller: routing, retries, access controls, logging, cost allocation, versioning, and rollback.

The latency numbers need context

Writer says it can hit sub-200 ms response times for 8K context windows using tensor parallelism and NVIDIA TensorRT.

If that holds up in customer environments, it's strong. But nobody should build around that number without context. Median or tail latency? Under what concurrency? With what model size and output length? On what hardware? Was retrieval included? Were safety filters and audit logging in the loop?

Enterprise teams already know how this goes. "Sub-200 ms" can mean a lot of things if the benchmark setup is selective enough.

The optimization choices themselves are credible. TensorRT is a common way to squeeze more out of inference. Tensor parallelism also makes sense for larger deployments where one accelerator won't comfortably carry the load.

The bigger issue is that many enterprise workflows don't need chat-speed responses. They need predictable throughput and sane failure behavior. If an internal assistant takes 600 ms instead of 200 ms, people will cope. If a compliance agent drops a citation or produces an unsupported recommendation, they won't.

Security is where deals get won or lost

Writer's security posture, based on the details provided, is aimed straight at enterprise buyer anxiety.

The company says services run in VPC-isolated clusters, with mTLS for in-transit encryption and hardware-backed secure enclaves for runtime integrity checks. It also says sensitive inputs don't persist beyond the session and that RBAC is enforced at the inference API layer.

That's the right checklist. It's also table stakes if you want to sell into regulated industries in 2025.

The downside is operational drag. Strong isolation, granular access control, and strict data retention policies make debugging harder. So does immutable audit storage, which the source material suggests via append-only databases or even blockchain. The blockchain part sounds like conference-slide filler. Most teams should use append-only logs with strong retention and access policies, then move on.

Human review loops matter more than vendors like to admit. Writer's emphasis on daily SME sampling and correction feeding back into RLHF pipelines is smart. It's also expensive. High-trust systems in finance or clinical research need ongoing review labor. The fantasy version of enterprise agents removes humans. The systems that survive audits keep them involved, just in different places.

The useful lesson for developers

If you're leading a team that wants agents in production, the takeaway from Writer's pitch is straightforward: treat them like production services, with all the usual constraints.

Pick narrower task boundaries

A service that summarizes claims documents is easier to test, monitor, and govern than a general-purpose "insurance agent." Scope creep kills trust fast.

Log enough to debug, not enough to create a compliance mess

Auditability matters. So does data minimization. Store prompts, outputs, scores, and structured rationale where needed, but think hard before retaining raw internal traces or every token-level artifact.

Measure quality like an ops problem

Latency and token spend are easy to track. Hallucination rate, citation accuracy, escalation rate, and override frequency are harder, but those are the numbers users actually care about.

Don't overdo domain fine-tuning too early

LoRA and other PEFT methods are useful, especially when terminology and policy precision matter. But a lot of teams still have bigger gains available from better retrieval, tighter prompt contracts, and cleaner workflow design.

Put policy controls outside the model

Compliance checking should rarely rely on model judgment alone. Pair generation with deterministic policy engines, validation layers, and hard failure modes.

A lot of agentic systems still break here. The model sounds confident, so people let it decide things that should be bounded by rules.

Writer is betting on boring AI

That's the bet underneath all of this.

Writer is selling a stack for companies that care less about flashy demos and more about dependable behavior under procurement scrutiny. Palmyra's architecture is modern, but conventional. The orchestration model is conventional too, and that's a point in its favor. The transparency pitch is interesting, though the reasoning-log claims deserve scrutiny. The security story is where it needs to be.

If Habib's Disrupt session is useful, it'll be because it pushes more teams to admit what enterprise AI deployment actually looks like: integration work, controls, review, and a model that fits the workflow instead of forcing the workflow to adapt.

Less glamorous. Much closer to how adoption actually happens.

What to watch

The caveat is that agent-style workflows still depend on permission design, evaluation, fallback paths, and human review. A demo can look autonomous while the production version still needs tight boundaries, logging, and clear ownership when the system gets something wrong.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service
AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof
AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

Related article
Why VCs still think enterprise AI adoption finally starts next year

Venture investors are making the same call again: next year is when enterprise AI starts paying off. This time, the pitch is less gullible. TechCrunch surveyed 24 enterprise-focused VCs, and the themes were pretty clear. Less talk about bigger chatbo...

Related article
How Gruve.ai wants to turn AI consulting into a software margins business

Enterprise IT consulting still runs on a model that hasn’t changed much in 20 years: large teams, layered staffing, long statements of work, and billing tied to hours or fixed project blocks. Gruve.ai is arguing for something else. Its pitch is strai...

Related article
Box CEO Aaron Levie on why AI agents still need enterprise SaaS

Aaron Levie made a useful point at TechCrunch Disrupt. Enterprise SaaS apps are not about to vanish under a swarm of autonomous agents. They’re becoming the structured layer agents sit on top of. That matters because a lot of enterprise AI is still s...