Why did many enterprises see low ROI from early AI projects?

Because they relied on generic models that didn’t align with internal processes, leading to expensive failures in retrieval, latency, and compliance.

What is adapter-based specialization?

It’s a fine-tuning approach that adds small, task-specific layers to a base model for domain relevance without full retraining overhead.

How does model routing work in an enterprise stack?

It directs complex reasoning tasks to large proprietary models and routine functions to smaller open-weight models to balance cost, latency, and accuracy.

Artificial Intelligence December 30, 2025

Why VCs still think enterprise AI adoption finally starts next year

Venture investors are making the same call again: next year is when enterprise AI starts paying off. This time, the pitch is less gullible. TechCrunch surveyed 24 enterprise-focused VCs, and the themes were pretty clear. Less talk about bigger chatbo...

Enterprise AI keeps missing ROI. VCs think 2026 finally changes that

Venture investors are making the same call again: next year is when enterprise AI starts paying off.

This time, the pitch is less gullible. TechCrunch surveyed 24 enterprise-focused VCs, and the themes were pretty clear. Less talk about bigger chatbots and shinier copilots. More talk about custom models, systems work, voice interfaces that can carry an actual workflow, and infrastructure designed around power limits instead of benchmark bragging.

That shift matters because the first wave mostly hasn't delivered. An MIT survey recently found that 95% of organizations still don't see meaningful ROI from AI. Three years after ChatGPT kicked off the buying spree, a lot of companies are still stuck between prototype purgatory and expensive demo mode.

So yes, VCs expect stronger enterprise adoption in 2026. Again. The useful part is what they think companies will pay for.

The easy phase is over

A lot of first-wave enterprise AI products rested on a flimsy idea: take a frontier model, wrap a UI around it, point it at company documents, and call it transformation.

That was enough to get pilots approved. It was also enough to expose the problems.

Generic models don't know a company's internal processes. Retrieval pipelines fail in boring and expensive ways. Latency gets ugly when you chain models together. Compliance teams don't want sensitive data leaving controlled environments. And if your edge is a prompt and a slick demo, one model release can wipe it out.

Investors seem to have adjusted. The newer thesis is much more operational. They're looking for companies that do some mix of:

fine-tuning or adapter-based specialization
robust evals tied to business outcomes
model routing across open and closed systems
observability across latency, cost, and failure modes
governance, auditability, and data residency
workflow integration that creates real switching costs

It's less glamorous. It's also much closer to what enterprises need.

Enterprises need customization because their operations are a mess

For most teams, the question has shifted. It isn't which model looks smartest in a demo. It's which setup is good enough, cheap enough, and governable enough for the job.

That pushes companies toward layered architectures instead of betting everything on one model.

A practical 2026 stack looks like this: use a large proprietary model for harder reasoning when data policy allows it, route routine classification or extraction to smaller open-weight models, keep retrieval hybrid with dense vectors and keyword search, and enforce structured outputs with json_schema or tool calling so results can plug into an ERP, EHR, or ticketing system without blowing up.

That last point gets underrated. Enterprises don't get paid for eloquent answers. They get paid for outputs that fit downstream systems and survive audits.

Fine-tuning still matters, just in narrower ways than the market claimed two years ago. Full retraining is expensive and usually unnecessary. Adapter layers, LoRA-style approaches, or domain tuning on controlled datasets often get most of the benefit without turning model maintenance into its own department. If a company can avoid retraining every time policy shifts or terminology changes, it will.

On-prem and private inference are gaining ground for obvious reasons. Tools like vLLM, Hugging Face Text Generation Inference, and even simpler local deployment options like Ollama are now good enough to make "bring the model to the data" a real design choice. For regulated teams, that matters more than a few extra benchmark points.

RAG is maturing, slowly

Retrieval-augmented generation is still the default enterprise pattern, but the toy version is wearing out.

The better implementations are getting stricter in useful ways. Hybrid retrieval, where BM25 or another lexical method sits next to vector search, is becoming standard because semantic retrieval misses exact matches that legal, medical, and operational work can't afford to miss. If a contract clause or SKU code has to be exact, approximate meaning doesn't cut it.

Data prep is getting more disciplined too. Good teams aren't dumping PDFs into a vector database and hoping for the best. They're chunking by document type, preserving metadata, separating collections for contracts versus logs versus manuals, and keeping lineage so they can answer a basic question: where did this output come from?

That's the dull plumbing behind trust. Without it, AI output stays politically fragile inside big companies.

The same goes for evals. Plenty of teams still score model quality in generic ways, then act surprised when nobody trusts the system. Stronger setups measure task success, refusal behavior, factuality, latency percentiles, and cost per request. In higher-stakes cases, they add human review loops because fully autonomous pipelines making uninspected decisions in claims processing or clinical admin are still a bad idea.

Voice is getting serious

One of the more interesting VC bets is that voice becomes a primary enterprise interface instead of a novelty add-on.

That holds up. Typing is fine for analysts and developers. It's a bad fit for dispatch, field service, support centers, warehouse floors, and anywhere workers already need their hands and eyes for something else. If speech systems can keep round-trip latency under roughly 300 milliseconds, handle barge-in properly, and stay accurate on domain vocabulary, voice becomes useful in a very practical way.

The technical requirements are tougher than product pages suggest. You need streaming ASR with partial hypotheses, endpointing that doesn't clip users or drag response time, low-latency TTS that doesn't sound absurd, diarization for multi-speaker settings, and decent noise handling. On-device wake words and neural codecs matter at the edge. So do pronunciation dictionaries for company names, part numbers, and acronyms.

Compliance matters just as much. Once voice sits inside customer support, healthcare, finance, or internal operations, recorded interactions become governance objects. Logging, redaction, retention policy, consent management, and regional rules stop being legal cleanup and become product requirements.

That's a lot more work than bolting speech I/O onto an existing chatbot.

Frontier labs are moving up the stack

Another theme in the VC comments deserves attention: frontier model providers may ship more turnkey apps themselves.

If OpenAI, Anthropic, Google, or another lab keeps pushing into domain workflows, a lot of AI startups lose the comfortable middle layer they've been occupying. The usual defense is familiar: enterprises want neutral vendors, better governance, deeper integrations, or domain specialization. Fair enough. It's still a tougher market.

The investor test is blunt: if a frontier lab ships something ten times better next quarter, does your company still have a reason to exist?

Technical buyers should use the same filter. If a vendor's moat is mostly prompt engineering or thin workflow glue, it's fragile. If the moat sits in proprietary data pipelines, embedded operational workflows, deep vertical integrations, and systems that respect policy boundaries, it's harder to dislodge.

That's why a lot of "software" companies are drifting toward forward-deployed services. Product alone often isn't enough. Customers need teams to embed, clean up data, redesign a process, build evals, and wire the system into ugly internal estates. Investors seem willing to fund that for now. It's less scalable than the old SaaS story, but it's closer to how enterprise AI actually gets adopted.

Power is now part of the product

One of the sharper parts of the 2026 thesis sits below the application layer: energy and infrastructure.

AI demand keeps hitting physical limits. GPU shortages are only part of it. Memory bandwidth, interconnects, cooling, rack density, and power budgets are all becoming first-order constraints. For inference at scale, HBM capacity and NVLink topology often matter as much as raw compute. Optical interconnects keep coming up for a reason. The bottleneck is often data movement, not FLOPs.

That changes how teams should think about performance. Tokens per joule sounds niche, but it's a useful planning metric. So is cost per successful task instead of cost per token. Quantization to INT8 or FP8, distillation, speculative decoding, and smarter routing to smaller models have moved out of the optimization footnotes. They're what make AI affordable enough to run in production.

This will shape procurement too. CIOs and platform teams are going to ask harder questions about thermal load, liquid cooling, energy reporting, and where workloads can run without tripping local power constraints. Startups selling AI infrastructure have a better case when they solve those problems instead of promising generic acceleration.

What technical teams should do

If you're building enterprise AI systems now, the advice is pretty straightforward. It's just more engineering-heavy than the market wanted to hear in 2023.

Start with one workflow that matters. Claims intake. Supplier onboarding. Internal support triage. Field maintenance guidance. Something with a measurable before and after.

Then build for control:

use hybrid retrieval, not vectors alone
enforce structured outputs
instrument latency, token usage, and failure rates by route
run continuous evals on task outcomes, not just model quality
keep audit trails and access controls tied to data policy
route small jobs to small models and save large models for expensive steps
treat prompt injection and data exfiltration as standard threat models

If voice is part of the plan, test it in noise, with accents, interruptions, and ugly real-world vocabulary. Lab demos lie.

Keep optionality too. Support open-weight deployment where you need on-prem or sovereign control, with a path to external APIs where policy allows it. Betting the whole stack on one model vendor still looks reckless.

The broad shift in VC sentiment is healthy enough. Enterprise AI is starting to get judged like enterprise software and infrastructure instead of magic. That makes life harder for companies built on smoke.

For engineers, that's probably good news. The teams that win from here will be the ones that can make these systems dependable enough to survive procurement, compliance, operations, and the electric bill.

What to watch

The caveat is that agent-style workflows still depend on permission design, evaluation, fallback paths, and human review. A demo can look autonomous while the production version still needs tight boundaries, logging, and clear ownership when the system gets something wrong.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI automation services

Move enterprise AI from pilots into measured workflows with controls and adoption support.

Related proof

Embedded AI engineering team extension

How a focused pod helped ship a delayed automation roadmap.

May Habib at Disrupt 2025 on moving AI agents into enterprise workflows

May Habib is taking the AI stage at TechCrunch Disrupt 2025 to talk about a problem plenty of enterprise teams still haven't solved: getting AI agents out of demos and into systems that actually matter. A lot of enterprise AI still looks like a chat ...

What Startup Battlefield reveals about the shift to enterprise AI agents

TechCrunch’s latest Startup Battlefield selection says something useful about where enterprise AI is headed. Not toward bigger chatbots. Toward agents that can be monitored, constrained, audited, and tied into real systems without triggering complian...

How Gruve.ai wants to turn AI consulting into a software margins business

Enterprise IT consulting still runs on a model that hasn’t changed much in 20 years: large teams, layered staffing, long statements of work, and billing tied to hours or fixed project blocks. Gruve.ai is arguing for something else. Its pitch is strai...