Former DeepMind researchers behind DeepStack raise Series A for trading AI
--- Three former DeepMind researchers who helped build DeepStack, the poker AI that beat top human players, have raised a big Series A for a very different arena: stock trading. Their Prague startup, EquiLibre Technologies, is now valued at $500 mill...
DeepMind poker veterans are turning reinforcement learning into a trading business
Three former DeepMind researchers who helped build DeepStack, the poker AI that beat top human players, have raised a big Series A for a very different arena: stock trading. Their Prague startup, EquiLibre Technologies, is now valued at $500 million after a new round led by Creandum.
The valuation matters less than the business. EquiLibre is trying to apply reinforcement learning to markets at hedge-fund scale, and by its own account the system is already trading billions of dollars in daily volume across the S&P 500 and Nasdaq through a partnership with Tower Research Capital. The startup also says its models have never had a negative month since launch.
That deserves a healthy squint. Trading results can look clean for a while, especially when capital deployment is selective and the strategy lives inside a broader quant stack. But the setup is interesting for a simpler reason: poker and market making share a brutal property that most machine learning products don’t. You get a numeric reward, often quickly, and the objective function is unforgiving.
Why poker people keep ending up in finance
The DeepMind trio, Martin Schmid, Rudolf Kadlec and Matej Moravcik, came up through one of the more successful lines of AI research: systems that learn by playing against themselves. DeepStack, their poker program, was a landmark because no-limit Texas hold ’em is a mess of hidden information, bluffing and probabilistic reasoning. It forced the system to reason under uncertainty instead of just memorizing patterns.
That maps cleanly onto trading. Markets are noisy, adversarial and partly hidden. You never see the full state. You only see prices, order flow, execution quality and whatever internal signals your own models generate. If you’ve got a model that can learn in that kind of environment, finance starts to look less like a leap and more like a familiar problem in a more expensive outfit.
Reinforcement learning is the obvious bridge. In RL, an agent tries actions, gets rewards and updates policy based on what worked. In poker, the reward might be chips won. In trading, it’s P&L, adjusted for risk and costs. Simple on paper. Brutal in practice.
Markets punish sloppy reward design. A model can look good if it chases gross return while ignoring slippage, fees, inventory risk and regime shifts. Any serious trading system has to encode those constraints, or it will get an expensive lesson in reality.
The appeal of RL in trading is real, and so are the traps
Investors keep funding teams like EquiLibre because the fit is better than it used to be. RL used to sound like a research project in search of a business. Now it’s useful in places where the feedback loop is tight and the execution layer is automated.
Trading fits that mold. You can train on historical data, simulate execution and test strategies against real market conditions. If the evaluation is disciplined, you can measure whether a policy improves the actual trading process instead of just its paper backtest.
Trading is also one of the easiest places to fool yourself with ML.
A model can overfit to a regime that no longer exists. It can pick up spurious patterns from correlated assets. It can win in a backtest and lose money the minute it meets real liquidity constraints. A strategy can also work until everyone else copies it, or until the market shifts enough to compress the edge.
EquiLibre’s answer appears to be scale and compute. The company says it’s trying to build one of the largest compute clusters in Central and Eastern Europe, and its founders have been explicit that they want to do more with fewer chips than giants like Jane Street. That’s a sensible goal. It’s also a hard one.
The best-funded quant firms don’t just buy GPUs. They buy data pipelines, execution infrastructure, research talent, low-latency systems and the patience to work through dead ends. Compute helps, but in trading, compute alone is rarely the bottleneck. Data quality and market access matter just as much.
Prague as a moat
EquiLibre’s location matters. The founders moved back to Czechia, recruited from a Czech diaspora network and built a 25-person team in Prague. That’s not a random detail. Talent gravity matters in AI, and retention does too. Prague is cheaper than San Francisco, but the bigger advantage may be focus. A smaller ecosystem can make it easier to keep researchers from drifting toward the next shiny lab down the street.
There’s also a strategic angle. Europe has a habit of underrating its own deep technical talent until it shows up in a funding round or a breakout company. EquiLibre joins a crop of frontier AI startups with strong research pedigrees, many tied to DeepMind alumni. The difference is that this one sits in a niche where revenue can arrive early if the models work.
That makes the company attractive to Creandum and probably to other investors as well. A lab that can produce signals with direct financial value has a clearer path to monetization than a general-purpose AI startup chasing enterprise pilots for two years before it finds product-market fit.
The “lab first” framing still matters. It suggests the company wants research autonomy, not just a hedge fund contract. That’s smart if it wants to keep pushing the frontier. It also creates tension. Trading businesses reward consistency and operational discipline. Research labs reward novelty. Those instincts don’t always sit well together.
Tower, Jane Street and the real competition
EquiLibre isn’t entering an empty market. It’s stepping into a field where automation is already deep and the leading firms are very good at keeping their edge private.
Tower Research Capital is already partnered with the startup. Jane Street says it uses RL and LLMs, along with “whatever else we need to train good models,” and claims access to tens of thousands of high-end GPUs. That tells you the market has moved far beyond simple statistical arbitrage. The contest is now about building better learning systems, feeding them higher-quality data and executing faster with fewer mistakes.
That’s bad news for anyone hoping to disrupt finance with a clever model and a demo deck.
The upside is that the market is big enough for specialized players to carve out space. EquiLibre’s CEO, Schmid, is right that this doesn’t have to be a winner-takes-all race. Unlike consumer social or search, trading doesn’t hinge on one universal model. Different time horizons, asset classes and risk constraints leave room for multiple approaches.
The bar is still high. To matter, EquiLibre has to show that its agents can survive through different volatility regimes, stay profitable after transaction costs and keep their edge as capital scales. That last part is where a lot of quant stories get ugly. A strategy that works at modest size can decay fast once it tries to absorb more flow.
What developers and AI teams should take from this
The technical lesson here is bigger than hedge funds.
Reinforcement learning still isn’t a default hammer for every problem. Most software systems don’t have dense, reliable reward signals. Markets do, which is why this class of company can exist at all. The same logic shows up in robotics, game playing, ad bidding and some forms of resource allocation. If the environment is messy but measurable, RL can be worth the pain.
The pain is the point, though. RL systems are expensive to train, hard to debug and easy to reward-hack. They need serious infrastructure around them: simulation, evaluation, monitoring, risk controls and a way to detect when the environment has drifted. In markets, drift is not a corner case. It’s the job.
For AI teams, the interesting part isn’t the model architecture. It’s the control system around it. How do you keep a policy from getting too confident? How do you cap exposure when the model starts behaving oddly? How do you check that a lift in returns isn’t just a backtest artifact? Those are engineering questions as much as research questions.
EquiLibre’s Series A says investors think the answers might be worth $500 million today. Maybe they’re right. Maybe the team really does have a durable edge from its DeepMind pedigree, its RL background and its unusual position between research and execution.
Or maybe it’s just early. Markets have a way of rewarding brilliance until they don’t.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Build AI-backed products and internal tools around clear product and delivery constraints.
How analytics infrastructure reduced decision lag across teams.
Google has launched the AI Futures Fund, a program for startups building with AI. The offer is straightforward: capital, cloud credits, access to DeepMind and Google Labs tech, and technical support from Google’s research and product teams. The money...
Laude Institute has announced the first batch of its Slingshots AI grants, backing 15 projects meant to “advance the science and practice of AI.” The name is secondary. The underlying bet is more interesting. Laude is putting money, compute, and engi...
Pramaana Labs has raised a $27 million seed round led by Khosla Ventures to tackle a hard enterprise AI problem: models that sound right when they’re wrong. The round, announced Wednesday, includes Accel, BoldCap, Nexus Venture Partners, Premji Inves...