What is General Intuition?

General Intuition is a New York-based AI startup spun out of Medal that is working on world models and agent training.

Why is gameplay video useful for AI agent training?

Gameplay video shows agents moving through dynamic environments, reacting to state changes, and making real-time decisions, which can help models learn prediction and control.

Artificial intelligence June 19, 2026

General Intuition seeks $300M at $2B valuation eight months after spinout

Q: Who is reportedly backing General Intuition’s new funding round?

TechCrunch reports that the round includes backing from Jeff Bezos, Eric Schmidt, Khosla Ventures, and General Catalyst.

General Intuition is reportedly in talks to raise about $300 million at a valuation just above $2 billion, according to TechCrunch. For a company that spun out only eight months ago, that’s a heavy number. The investor case is fairly clear: General I...

General Intuition’s $300M raise puts a big price on game-trained AI agents

The New York-based startup spun out of Medal, the gaming clip platform, after raising a $134 million seed round in 2025. Its founding team includes Medal co-founder Pim de Witte, along with Eloi Alonso, Adam Jelley, and Vincent Micheli, researchers with backgrounds in world modeling and simulation.

TechCrunch reports that the new round includes backing from Jeff Bezos, Eric Schmidt, Khosla Ventures, and General Catalyst. The money is expected to go toward compute capacity, with a product launch targeted for late summer or early fall.

That timeline matters because world models are starting to move out of research demos and into commercial products. General Intuition is taking a specific angle: using world models internally to train agents, rather than selling world models as the product.

The dataset is the pitch

General Intuition’s main asset comes from Medal: roughly 2 billion videos per year from 10 million monthly active users.

The useful part is the type of video. Much of it comes from gameplay, often from a first-person perspective. Games produce dense examples of agents moving through environments, reacting to changing state, predicting consequences, and operating under constraints. A player turns a corner, tracks an opponent, manages inventory, aims, jumps, avoids hazards, reads a minimap, responds to audio cues, and updates plans in real time.

For AI researchers, that data has spatial and temporal structure. The model can watch actions unfold inside simulated environments where physics, objectives, camera motion, and feedback loops are tightly connected.

Passive internet video usually lacks that. A YouTube cooking clip shows the world changing, but the model often doesn’t know which action caused each state transition. Game footage is imperfect, but it sits closer to the kind of data needed to teach agents prediction and control.

There’s an obvious catch. Most Medal clips are likely observational rather than full environment logs. A rendered video tells you what appeared on screen, not necessarily the underlying game state, user inputs, object metadata, collision geometry, or reward signals. That matters. If General Intuition can pair video with richer interaction traces, it has a much stronger training signal. If it only has pixels and timestamps, the problem gets harder.

Even pixels alone at that scale have value. Video can teach continuity, occlusion, object permanence, motion priors, affordances, and cause-effect patterns. Those remain weak spots for many language-first agents.

Why world models matter for agents

A world model tries to learn how an environment behaves. Given recent observations and, in some cases, an action, it predicts future states. In robotics, games, autonomous driving, and simulated agents, that prediction loop is central.

A language model can write a plan. A world model can help an agent estimate what happens next.

If an agent sees a narrow platform, a moving obstacle, and a target location, it needs something closer to embodied reasoning than chat completion. It has to infer spatial relationships, timing, risk, and possible action sequences. In practice, that might mean predicting future frames, latent states, trajectories, rewards, or some combination of those.

World models are attractive because real-world trial and error is expensive. You don’t want a robot learning millions of failure cases by breaking hardware. You don’t want an autonomous system discovering edge cases only in production. Simulation gives teams scale, but hand-built simulators are brittle and costly. Learned simulators offer another route: train on large amounts of real or synthetic interaction data, then use the model as a sandbox for agent training.

That’s the bet behind a lot of current work. Runway, Decart, World Labs, and Google’s Genie line are all pushing world models in different directions. Some focus on controllable generated environments. Some target video creation. Others lean toward robotics or driving simulation. Google’s Genie 3, for instance, has started integrating Google Maps and Street View data to simulate real streets.

General Intuition’s claim is narrower and potentially more useful for developers building agents: the world model is infrastructure for training, not necessarily the thing customers directly use.

Selling agents instead of simulations

That distinction matters.

A company selling a world model needs developers, game studios, roboticists, or creative teams to build workflows around simulated environments. That can work, but it often turns into tooling, SDKs, and integration pain. Latency, determinism, scene consistency, object control, and evaluation all become product requirements.

General Intuition appears to be aiming at a different layer. It wants to train agents that already understand how to operate across space and time. The output isn’t a simulated room. The output is an agent that can perceive, anticipate, and act.

That’s a cleaner business pitch if the agent works. It’s also hard to judge from the outside.

For technical buyers, the immediate question is simple: what is the product surface?

There are several possible forms:

An API for embodied agents that can operate in simulated or game-like environments
A robotics policy model trained through learned simulation
A developer platform for training custom agents using General Intuition’s models
A vertical product for gaming, testing, automation, or simulation QA
A research platform for spatial-temporal reasoning benchmarks

The source report doesn’t specify. The late-summer or early-fall product launch should show whether General Intuition is shipping a model, an agent framework, an SDK, or a vertical application.

Until then, the valuation is pricing in data advantage and execution, not visible product-market fit.

Compute is the bottleneck, and probably the moat

The reported use of funds is compute expansion. That tracks.

Training useful video models is brutally expensive. Training models that support agent learning is worse, because the system may need to handle long-horizon sequences, action-conditioned prediction, multimodal input, and repeated rollouts. If the model is used for reinforcement learning or policy optimization, inference cost can dominate too.

A large corpus of gameplay video creates storage, preprocessing, and training problems:

Video normalization across games, resolutions, frame rates, overlays, and HUDs
Deduplication and clip quality filtering
Temporal segmentation to identify meaningful action sequences
Representation learning that separates camera movement from world movement
Handling copyrighted game content and user-generated material
Scaling distributed training without drowning in I/O

The last point is easy to underestimate. Video training pipelines often hit data throughput limits before they hit GPU math limits. Feeding thousands of accelerators with compressed, decoded, augmented video takes serious infrastructure. For a startup, a $300 million raise can disappear quickly into GPU clusters, cloud commitments, storage, and engineering salaries.

Evaluation is another hard problem. Language models can be graded on coding tasks, math benchmarks, retrieval, or human preference tests. World models and embodied agents need messier metrics: prediction accuracy over time, controllability, policy transfer, task completion, robustness to distribution shift, and sim-to-real performance when robotics enters the picture.

A demo can look great for 30 seconds. Long-horizon consistency is where many systems crack.

Where game data helps, and where it breaks

Game footage is a strong training source because games compress many useful features into visually rich, goal-driven environments. They include physics approximations, navigation, opponents, tools, resources, maps, occlusion, partial observability, and fast feedback.

Games are also weird.

The physics may be stylized. Objects behave according to engine rules, not real-world dynamics. Characters respawn. Inventory systems have arbitrary logic. Cameras clip through walls. A player’s view often includes UI elements that leak information unavailable in physical environments. The distribution of actions is shaped by entertainment, not real-world utility.

That doesn’t kill the approach. It defines its limits.

For gaming agents, QA automation, simulation control, and virtual assistants inside 3D environments, the fit is obvious. For robotics, the transfer problem is much harder. A model trained heavily on first-person shooter or sandbox gameplay may learn useful priors about motion and space, but it still needs grounding in real sensors, real actuation, friction, lighting, object deformation, and failure modes.

The strongest version of General Intuition’s approach likely combines game video with other datasets: robotics trajectories, synthetic simulation, egocentric human video, 3D scene data, and possibly action labels where available. Medal gives the company scale and a distinctive starting point. It doesn’t solve embodiment by itself.

What technical teams should watch

For developers and AI engineering leads, the fundraising number is less interesting than whether General Intuition can expose something usable.

A credible launch should answer a few concrete questions:

Can developers control the agent? Agent behavior needs constraints, APIs, and observability. A black-box agent that “understands space” is hard to test and harder to trust.
What environments does it support? If the system works only in selected game-like demos, that’s still interesting, but narrow. If it generalizes across engines, scenes, and task formats, the technical bar is much higher.
How does evaluation work? Serious users will want task success rates, rollout stability, latency numbers, and failure cases. Pretty videos won’t be enough.
What are the integration points? Unity, Unreal, web-based 3D environments, robotics simulators, browser automation, and custom environments all imply different SDKs and runtime constraints.
What data rights are attached? Training on user-uploaded gameplay clips can raise legal and platform questions. Game publishers, streamers, and users may all have stakes in how that data gets used.

The security angle is worth watching too. Agents trained to operate in dynamic environments can become powerful automation tools. If the product targets games or virtual worlds, anti-cheat systems, bot detection, and platform abuse become immediate concerns. If it moves toward robotics or enterprise automation, safety policies and auditability matter even more.

A big valuation for an unresolved category

A $2 billion valuation for an eight-month-old spinout is aggressive. The investor logic is understandable: foundation models for agents need proprietary data, serious compute, and a team that understands simulation. General Intuition checks those boxes better than most new startups.

The open question is whether that becomes a durable product.

World models are drawing money because they address a real weakness in current AI systems. Text-first agents can call tools and write plans, but they often struggle with grounded reasoning, physical intuition, and long-horizon state. Models trained on interactive video may help close that gap.

General Intuition has an unusually large dataset and a plausible technical thesis. Now it has to prove that game-trained intuition can survive contact with real products, real developers, and real evaluation.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

AI agents development

Design agentic workflows with tools, guardrails, approvals, and rollout controls.

Related proof

AI support triage automation

How AI-assisted routing cut manual support triage time by 47%.

Encore AI raises $30M to train customer agents on company call data

--- Encore AI just raised $30 million in Series A funding led by Team8 to push a simple idea: the best training data for customer-facing AI agents is already sitting inside a company’s own calls, emails, texts, and CRM records. That’s a strong thesis...

Sarvam hits $1.5B valuation as HCLTech leads $234M AI round

Sarvam has raised $234 million at a $1.5 billion valuation, making the Bengaluru startup India’s newest AI unicorn. HCLTech is leading the round with a $150 million investment, joined by Bessemer Venture Partners and existing backers Khosla Ventures ...

General Intuition raises $320M to train AI agents on game video

General Intuition has raised $320 million at a $2.3 billion valuation, less than a year after launching with a $134 million seed round. The New York startup is betting that video game clips, especially clips paired with human input data, can help tra...