Quadric bets on on-device AI inference as processor IP revenue climbs
Quadric has picked a very good time to sell AI processor IP. The company says licensing revenue hit $15 million to $20 million in 2025, up from about $4 million in 2024. It also raised a $30 million Series C led by Accelerate Fund, which puts its pos...
Quadric is cashing in on on-device AI, but the hard part is still software
Quadric has picked a very good time to sell AI processor IP.
The company says licensing revenue hit $15 million to $20 million in 2025, up from about $4 million in 2024. It also raised a $30 million Series C led by Accelerate Fund, which puts its post-money valuation around $270 million to $300 million, up sharply from the roughly $100 million level of its 2022 Series B.
That growth is notable. The more interesting part is the bet behind it.
Quadric is betting that AI inference is moving out of the cloud and into devices fast enough that chipmakers and OEMs want something more flexible than fixed-function neural blocks. It sells programmable AI processor IP, along with the compiler, runtime, and tools needed to run models locally on customer silicon. The target list is broad: laptops, automotive systems, industrial equipment, printers, and small local servers.
The timing makes sense. Model cycles are moving much faster than hardware cycles, and fixed silicon can age badly when the software target keeps moving.
Why the timing works
Quadric started in automotive, where local inference already had a clear case. Driver assistance, sensor fusion, voice interfaces, and other real-time workloads can’t wait on a cloud round trip. They also can’t casually send sensitive data off-device.
What changed over the past 18 months is that transformer-style inference spread into everything else. Laptop vendors want local assistants and document tools. Industrial vendors want vision systems that still work when the network doesn’t. Governments and enterprises talking about sovereign AI want inference to stay on local devices or nearby infrastructure they control.
That shift is strong enough that even companies comfortable with cloud economics are taking a harder look at where inference belongs. If a workload is steady, predictable, and already tied to the user’s device, sending every token generation or image pass to a hyperscaler starts to look expensive and unnecessary.
Latency matters. Cost matters. Data control matters. Networks still fail.
Quadric sits in a useful spot. It doesn’t sell finished chips. It licenses IP that customers can drop into their own SoCs, on their own process nodes, with their own supply chains. Customers named so far include Kyocera and Denso, and Quadric says the first laptops using its tech should ship this year.
Programmable hardware has an obvious appeal
The company describes its approach as CUDA-like programmable infrastructure for edge and device inference. That’s an ambitious comparison, but the idea itself is straightforward.
A lot of AI accelerators are heavily tuned for the operations that mattered when the block was designed. That works until the model mix changes. Then the hardware starts looking awkward against the new thing, whether that’s a different attention pattern, a new sparsity method, speculative decoding, or some ugly but effective inference trick everyone adopts a few months later.
Quadric’s answer is to keep the hardware more programmable and push more of the adaptation into software: compilers, runtimes, kernel scheduling, memory planning.
That matters because inference performance is often limited by memory movement, not raw arithmetic. For transformer workloads especially, efficient KV cache handling, smart weight staging, and overlapping DMA with compute can decide whether a design looks competitive or disappointing. A flexible runtime has a better chance of keeping up than a rigid datapath built around last year’s graph shapes.
The downside is blunt: programmable hardware lives or dies on the software stack. If the compiler misses optimizations, if model export is brittle, if developers need to hand-write kernels just to get acceptable throughput, the value falls apart.
That’s the hard part, and it gets glossed over too often.
Why developers should care
For engineers building products, Quadric’s model is attractive for a simple reason. It reduces the odds of a hardware dead end.
If you’re designing a laptop, industrial controller, or automotive platform, you’re stuck with multi-year hardware cycles. You can’t respin silicon every time the ML world adopts a new attention variant or decides INT4 is good enough for another model class. A programmable inference engine gives you some room to absorb that change in software.
Some room, anyway.
The practical checklist is boring, which is why it matters:
- Can it run the model families you care about today, not on a roadmap slide?
- Does the toolchain support sane
PyTorchtoONNXstyle workflows? - How painful is quantization?
- What happens when weights don’t fit neatly in on-chip SRAM?
- Can you update models securely in the field?
Those questions decide whether a programmable NPU is useful or just theoretically flexible.
For teams targeting local LLM inference, precision support is a big one. Support for FP16, BF16, INT8, and increasingly INT4 is table stakes for models in the 1B to 7B parameter range on laptops and compact edge systems. But a precision table on a slide doesn’t tell you much. The quality of the quantization pipeline matters just as much. Per-channel quantization, decent calibration flows, and fallback paths for layers that don’t quantize cleanly are where products either ship or stall.
Then there’s runtime behavior. Does the scheduler handle long sequence lengths without falling apart? How does it manage cache reuse? Can it support streaming inference for voice without ugly latency spikes? Can the compiler fuse operations well enough to avoid drowning in memory traffic?
Peak TOPS numbers get the headline. This is the part that decides whether anyone can ship.
The competition is real
Quadric is walking into a crowded market.
At the device level, Qualcomm, Apple, MediaTek, and every serious PC silicon vendor are pushing integrated NPUs. In EDA and licensable blocks, Synopsys and Cadence already have deep relationships with chipmakers. In embedded AI, NVIDIA Jetson still has huge mindshare because developers can actually get work done with it.
That last point matters. In AI hardware, software usually decides the winner. Developers will tolerate imperfect hardware if the tooling is solid. They’ll abandon promising hardware when the compiler is flaky, the runtime is opaque, or the benchmarks only show up under suspicious demo conditions.
Quadric’s pitch is chip-agnostic AI IP rather than a vertically integrated stack tied to one silicon vendor. That will appeal to OEMs that want supply-chain flexibility or don’t want their roadmap dictated by someone else’s NPU. It also fits the needs of sovereign AI projects in places like India and Malaysia, where local control matters at the infrastructure and procurement level.
But “chip agnostic” doesn’t make integration easy. It just moves the work around. Someone still has to make the memory hierarchy, interconnects, firmware, security, and model deployment pipeline behave like one system.
Cloud to device is happening, but it’s not a clean break
A lot of AI infrastructure coverage treats this as a neat swing from cloud to edge. It won’t be that tidy.
Much of the market is heading toward hybrid inference. Small and mid-sized models run locally for latency, privacy, and cost reasons. Bigger tasks, training, fleet analytics, and model updates stay in the cloud. The split depends on the product.
That means on-device AI hardware doesn’t need to replace cloud GPUs to matter. It only needs to take over the workloads that are expensive or irritating to keep remote.
That’s already a large market.
Voice assistants are the obvious example. Document understanding on enterprise laptops is another. Industrial inspection, local OCR, camera analytics, and in-car copilots all benefit from staying close to the sensor or the user. In those cases, cutting latency from hundreds of milliseconds to tens of milliseconds changes the product in a real way.
The economics are just as compelling. Sustained inference traffic in the cloud becomes a recurring tax. If the workload is stable and the device is capable, local execution often wins on total cost of ownership.
What to ask before buying the story
If you’re evaluating this as a technical buyer, the red flags are familiar.
Ask for real benchmarks. Not peak throughput on toy graphs. Ask for tokens per second on specific 3B to 7B models, end-to-end latency distributions for voice pipelines, and vision throughput at known resolutions with clear memory budgets.
Ask how the stack handles weights that exceed local SRAM, because that’s where plenty of edge AI claims start to wobble.
Ask about security. On-device inference pushes responsibility onto the device maker. Secure boot, model signing, encrypted parameter storage, OTA updates, and rollback support stop being optional when the model is part of the product.
And ask what happens when the model changes six months after tape-out. That is the whole case for programmability. If the answer involves a lot of custom engineering, the flexibility pitch is thinner than it sounds.
Quadric has clearly found demand. The revenue jump says that much. The first laptop launches will matter more than the funding round because shipped hardware has a way of stripping the theory out of edge AI.
If Quadric’s software stack holds up, it could end up in a strong position: close enough to the silicon to matter, flexible enough to survive model churn, and useful to OEMs that don’t want to wait on someone else’s NPU roadmap.
If the tooling lags, the story gets rough fast. In on-device AI, the compiler is often the product.
What to watch
The harder part is not the headline capacity number. It is whether the economics, supply chain, power availability, and operational reliability hold up once teams try to use this at production scale. Buyers should treat the announcement as a signal of direction, not proof that cost, latency, or availability problems are solved.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Build the data and cloud foundations that AI workloads need to run reliably.
How pipeline modernization cut reporting delays by 63%.
Malaysia now has a domestic edge AI processor. That’s the point of SkyeChip’s MARS1000 launch. It’s pitched as the country’s first homegrown edge AI chip, built for on-device inference, not cloud training. That matters because this is the part of AI ...
CES 2026 had the usual stack of gadgets. The more useful signal came from somewhere else. AI is moving deeper into machines with hard latency limits, bad connectivity, safety constraints, and users who won't wait around for cloud round-trips. Nvidia,...
AMD used CES 2026 to refresh its AI PC pitch with the Ryzen AI 400 Series and keep the gaming side moving with the Ryzen 7 9850X3D. The broad pitch is familiar. AMD wants local AI to feel standard on mainstream PCs instead of a premium extra on a few...