Netflix used generative AI VFX in El Eternauta and cut one render by 10x
Netflix says it used generative AI to create final on-screen visual effects for the Argentine sci-fi series El Eternauta, with one building-collapse sequence rendered about 10 times faster than a traditional workflow. The important part is simple: th...
Netflix puts generative AI into final VFX, and that changes the production math
Netflix says it used generative AI to create final on-screen visual effects for the Argentine sci-fi series El Eternauta, with one building-collapse sequence rendered about 10 times faster than a traditional workflow. The important part is simple: this wasn't a concept demo, previs, or an internal test. Netflix says the AI-assisted work made it into the finished show.
Ted Sarandos framed it on Netflix's earnings call as a creator tool, not a headcount story. That's the expected line. The sharper point is technical and operational. If a major streamer can get acceptable final-shot quality from a hybrid GenAI pipeline, the VFX bottleneck starts to shift. Less time in labor-heavy shot production. More pressure on pipelines, review systems, model governance, and artist supervision.
Studios have been moving in this direction for a while. Netflix is one of the first to say it plainly.
Why this matters more than another AI feature release
Media companies have spent the past two years talking about AI in safer categories: recommendation systems, search, dubbing, metadata, ad targeting. Netflix already does plenty of that, and Sarandos also pointed to personalization, search, and interactive advertising plans. None of that is surprising.
Final footage is.
Studios have mostly kept generative models in ideation, quick mockups, synthetic background elements, and internal tools. Once model output survives into a released episode, the question changes. Now it's about how to run this inside a production pipeline without losing quality control or creating a mess around labor and rights.
That's a pipeline problem.
The 10x speedup claim matters too, assuming it holds up beyond one scene. Even if the average gain ends up smaller, a big reduction on specific shot classes, destruction, crowds, environment augmentation, cleanup, can move schedules in a real way.
What Netflix likely built under the hood
Netflix didn't publish an architecture diagram, so this part has to stay at the level of informed inference. Still, the outline is familiar.
A plausible stack looks something like this:
- Video diffusion models for generating temporally coherent motion across frames
- Neural rendering or NeRF-style scene representations to keep geometry and camera movement from drifting
- Segmentation and scene understanding models to isolate structures, debris, sky, actors, and practical elements
- Inpainting and compositing in a traditional VFX toolchain, probably with Nuke and similar software still doing plenty of the final assembly
- Human review loops to catch flicker, broken edges, weird object persistence, and the rest of video generation's usual failure modes
That last point matters. Nobody serious ships raw diffusion output on trust. The likely workflow is hybrid: practical footage and standard CG where needed, AI generation for certain effects layers or shot variants, then artist cleanup and compositing to make the result match the cinematography.
This also wasn't a text box and a lucky prompt. Professional pipelines don't work that way if the shot has to hold up on a 4K TV.
A more likely path:
- Ingest plate footage, camera data, and maybe LIDAR or other scene scans.
- Run object detection and segmentation to identify structural regions and moving subjects.
- Generate destruction or environment changes with a video model conditioned on the shot layout.
- Post-process for motion blur, grain, lens characteristics, and color match.
- Put humans in review and send bad frames back through the loop.
That's where the engineering work sits. Not in making one impressive clip. In building a repeatable system that can produce hundreds of frames without temporal artifacts or style drift.
Why 10x faster is plausible, and where it probably isn't
The easy mistake is to read "10x faster rendering" as a blanket claim about all VFX work. It isn't that.
Traditional CGI and simulation-heavy pipelines are slow because they pile up expensive steps: modeling, rigging, physics simulation, lighting, rendering, iteration, compositing, revisions. A generative workflow can skip parts of that stack for certain shots. If the goal is a convincing collapse from a fixed angle for a short duration, a model may get to final-shot quality much faster than a fully simulated destruction pipeline.
That gets more believable when:
- the shot is short
- the camera path is known
- the destruction doesn't need exact physical interaction with many foreground elements
- the sequence can tolerate approximate behavior as long as it reads as convincing
It gets weaker for shots with close character interaction, strict object continuity, or long takes where temporal errors pile up. Physics still matters. So does editability. If a director wants precise control over when a wall shears off, where debris lands, and how dust interacts with actors, traditional simulation is often the safer choice.
So yes, 10x on a scene category is believable. Tenfold improvement as a general statement about VFX production isn't.
What to watch
The main caveat is that an announcement does not prove durable production value. The practical test is whether teams can use this reliably, measure the benefit, control the failure modes, and justify the cost once the initial novelty wears off.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Speed up clipping, transcripts, subtitles, tagging, repurposing, and review workflows.
How an AI video workflow cut content repurposing time by 54%.
Moonvalley has raised another $43 million, according to an SEC filing first reported by TechCrunch. In a crowded AI video market, that points to a more specific bet than the usual demo-first startup pitch. Investors still want AI video. But they seem...
Disney has signed a three-year deal with OpenAI to bring more than 200 characters from Disney, Pixar, Marvel, and Lucasfilm into Sora and ChatGPT Images. It's also investing $1 billion in OpenAI. The bigger shift is what the deal says about the marke...
Midjourney has launched V1, its first AI video model. The basic workflow is simple: give it a still image and it generates four 5-second video clips from that frame. You can then extend those clips to roughly 21 seconds. All of it runs through the sa...