Wan 2.2 is a major step up from 2.1: a Mixture-of-Experts (MoE) architecture, a much larger training set, and a new 5 B “hybrid” model that can crank out 720 p @ 24 fps on a single 4090. Below you’ll find (1) the key upgrades, (2) an updated prompt-writing framework, and (3) several ready-to-run sample prompts.
Wan 2.2 vs 2.1 at a Glance
Feature
Wan 2.1
Wan 2.2
Core architecture
Dense diffusion
MoE diffusion: high-noise + low-noise experts hand off mid-denoise
Sharper frames, fewer artefacts – The MoE hand-off cleans up fine detail without killing global coherence.
Better motion – Fast pans, parallax pulls and multi-object scenes survive more often.
Cheaper to tinker – The 5 B TI2V fits into 8 GB with off-loading; perfect for local prototyping.
Prompt Writing Framework (Wan 2.2 Edition)
Much like Wan2.1, you want to target prompts of 80-120 words. Under-specify and the MoE fills in its own "cinematic" defaults – sometimes great, often random.
1) Shot Order
Lead with what the camera sees first, then describe the subsequent moves:
Opening shot -> Camera motion -> Reveal / pay-off
2) Camera Language
When we tested Wan2.1, we found that it was not so reliable at following every camera movement we specified. Wan2.2 has improved considerably in this area.
Wan2.2 works best with clips that are no longer than 5 seconds long. You control the length via a combination of frame count and frames per second.
frame count: <= 120 works well.
resolution (960x540 for quick tests; 1280x720 for publication)
frames/second: the default is 24; for quick tests, we often use 16
6) Negative Prompt
The negative prompt is now more consistently respected.
We've generally left the default in:
"bright colors, overexposed, static, blurred details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers, still picture, cluttered background, three legs, many people in the background, walking backwards"
Sample Results
We ran a few experiments on prompt adherence with camera motions. These are some of our earliest experiments with Wan2.2. We will soon publish a more detailed prompt guide after more thorough experimentation.
Neon Drift (cyberpunk tracking shot)
Prompt: "A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas. Volumetric pink-blue backlight cuts through steam vents, puddles mirror the glow. Lens flare, shallow depth of field. Moody, Blade-Runner vibe."
A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas. Volumetric pink-blue backlight cuts through steam vents, puddles mirror the glow. Lens flare, shallow depth of field. Moody, Blade-Runner vibe.
Alpine Reveal (pull back)
Prompt: "Extreme close-up of a mountaineer’s ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him. Crisp morning air, golden rim-light, subtle lens flare."
The initial result here seems to have ignored the desired camera movement completely.
Extreme close-up of a mountaineer’s ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him. Crisp morning air, golden rim-light, subtle lens flare.
Aquatic Ballet (slow motion orbit)
Prompt: "An orca breaches in crystal-clear Arctic waters. Slow 360° orbital shot around the soaring whale as droplets hang suspended. Soft polar sunset lights the scene in pastel pinks and blues; cinemagraphic HDR."
Here, we were generating the default 5-second video, and it seems to have been a bit long for the prompt. The slow motion element of the prompt seems to have been respected but the camera orbital was completely ignored.
An orca breaches in crystal-clear Arctic waters. Slow 360° orbital shot around the soaring whale as droplets hang suspended. Soft polar sunset lights the scene in pastel pinks and blues; cinemagraphic HDR.
Camera Movement Comparison with Wan2.1
In a previous post, we tried a number of prompts to get precise control over camera movements with Wan2.1. When we were running those experiments, we had to repeatedly and carefully tweak the prompt to get the effect we wanted. And in the end, some camera movements were very difficult to achieve. With Wan2.2, we went back to the same prompts and tried to see how much had improved.
Pan Left/Right
In our original prompt to achieve panning motion, we found that getting the right direction was extremely difficult (it was basically up to random chance), and Wan2.1 produced discontinuities in camera movement:
A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet.
With Wan2.2, we could successfully control pan direction on the first try:
Same prompt as above.
We changed the pan direction in the prompt to "right".
With Wan2.1, we could not achieve a fast whip pan effect. And the same remains true with 2.2 – although the output from 2.2 is far superior to anything we achieved with 2.1:
We changed the pan effect in the prompt to "whip pan"
Pull back
With Wan2.1, camera pull back worked quite well, and of course, it also does so with 2.2:
Close up shot of the determined face of a battle-worn samurai. Camera pulls back to reveal him standing alone on a foggy battlefield, gripping his katana. Camera pulls back to reveal fallen warriors behind him. Wind whips through the trees, sending red autumn leaves swirling.
Dolly in/out
With Wan2.1, we could successfully and reliably achieve a dolly-in effect, but dolly-out would reliably fail.
This was our best result attempting dolly-out with Wan2.1:
This was our best result for "dolly out" with Wan2.1. It could only dolly-in.
With Wan2.2, we used the same prompt, changing only the camera effect and we got the following results on the first attempt for dolly-in and dolly-out:
In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera hitchcock zooms in. The background shows an abandoned, dim factory with light filtering through the windows. There's a noticeable grainy texture. A medium shot with a straight-on close-up of the character.
In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera dollies out. The background shows an abandoned, dim factory with light filtering through the windows. There's a noticeable grainy texture. A medium shot with a straight-on close-up of the character.
Not only does Wan2.2 respect these camera movements, it creates far smoother motions as well.
Tilt
Similar to dolly-out, we had a hard time achieving a tilt effect with Wan2.1:
Wan2.1 could not produce a great tilt effect
Wan2.2 once again performs incredibly well.
A close-up shot of the feet of a man wearing mountaineering gear, standing in a grassy field. Camera slowly tilts up, revealing the full body of a mountaineer wearing gear. In the distance, majestic rocky mountains tower above.
Tracking Shot
Similar to Wan2.1, 2.2 also excels at tracking shots:
Wan2.1 Tracking shot
A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. Pedestrians in futuristic outfits rush by as holographic advertisements flicker in the air. The camera follows a hooded figure in a long tracking shot, weaving through the crowded market. Overhead lights cast a moody glow, while fog drifts through the alleyways. The scene is dark and mysterious, with blue and purple lighting creating a high-tech, dystopian feel.
Crash Zoom
Crash zooms are very handy for a comedic effect. But with Wan 2.1, they were nearly impossible:
Attempting crash zooms just led to bad transitions in Wan2.1
Wan2.2 excels where its predecessor failed:
In a large dimly lit midcentury modern room, a man sits with an authoritative and pensive pose on a leather chair. He is wearing a dark suit jacket and grey trousers. He has silver hair. The chair is in the center of the screen. Behind the chair, there is an oak console with a lamp. The wall is made of oak panels. The man looks directly at the camera. Camera rapidly zooms in on the man's face. Then he lets out a slight smirk.
Camera Roll
Camera rolls are used in videography to give a sense of unease, confusion, disorientation or instability.
Wan2.1 made it very difficult to achieve this effect. This was the best we could do after many attempts:
Our best attempt at camera roll with Wan2.1 was after many attempts and still not quite right.
Wan2.2 produced incredible results with the first attempt:
Overhead shot of a man fallen asleep on his desk in front of his computer. The room is dark except for the light from the monitor. The man's head is on his arms by the keyboard. Around the desk, there is a mess of papers and floppy disks. The camera rolls in full 360 motion.
Getting Started in ComfyUI
Wan2.2 is available in a few variations:
Model Type
Model Name
Parameters
Main Function
Model Repository
Hybrid Model
Wan2.2-TI2V-5B
5 B
Hybrid version supporting both text-to-video and image-to-video; a single model meets two core task requirements
The hybrid model is great to run on local setups, if you're limited in GPU power. On InstaSD, we have setup the 14B parameter workflows for text-to-video and image-to-video so you can run them on powerful GPUs right away:
Wan 2.2 goes a long way in bridging the gap between open-source and the best proprietary generators—especially for motion and cinematic polish. Spend time on camera verbs and lighting adjectives; the MoE backbone will take care of the rest.
Happy prompting!
Run in Wundernode
WunderNode is the easiest way to produce incredible AI content. No IT department required.