Wan2.2 – What's New & How to Write Killer Prompts

Wan 2.2 vs 2.1 at a Glance

Feature	Wan 2.1	Wan 2.2
Core architecture	Dense diffusion	MoE diffusion: high-noise + low-noise experts hand off mid-denoise
Training data	Baseline set	+65.6% images, +83.2% videos
Aesthetic control	Basic tags	Cinematic-level labels (lighting, composition, colour)
Motion fidelity	Moderate	Large-scale complex motion; smoother, more controllable
Model line-up	14B T2V, I2V	14B T2V / I2V + 5B TI2V hybrid (720p on consumer GPUs)

Why you care...

Sharper frames, fewer artefacts – The MoE hand-off cleans up fine detail without killing global coherence.
Better motion – Fast pans, parallax pulls and multi-object scenes survive more often.
Cheaper to tinker – The 5 B TI2V fits into 8 GB with off-loading; perfect for local prototyping.

Prompt Writing Framework (Wan 2.2 Edition)

Much like Wan2.1, you want to target prompts of 80-120 words. Under-specify and the MoE fills in its own "cinematic" defaults – sometimes great, often random.

1) Shot Order

Lead with what the camera sees first, then describe the subsequent moves:

Opening shot -> Camera motion -> Reveal / pay-off

2) Camera Language

When we tested Wan2.1, we found that it was not so reliable at following every camera movement we specified. Wan2.2 has improved considerably in this area.

pan left/right
tilt up/down
dolly in/out
orbital arc
crane up

3) Motion modifiers

speed adjectives: slow-motion, rapid whip-pan, time-lapse
parallax cues: "foreground reeds sway, background mountains fixed"

4) Aesthetic tags

lighting: volumetric dusk, harsh noon sun, neon rim light, etc.
colour-grade: "teal-and-orange", "bleach-bypass", "kodak portra"
lens/style: anamorphic bokeh, 16mm grain, CGI stylized

5) Temporal & spatial params

Wan2.2 works best with clips that are no longer than 5 seconds long. You control the length via a combination of frame count and frames per second.

frame count: <= 120 works well.
resolution (960x540 for quick tests; 1280x720 for publication)
frames/second: the default is 24; for quick tests, we often use 16

6) Negative Prompt

The negative prompt is now more consistently respected.

We've generally left the default in:

"bright colors, overexposed, static, blurred details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers, still picture, cluttered background, three legs, many people in the background, walking backwards"

Sample Results

We ran a few experiments on prompt adherence with camera motions. These are some of our earliest experiments with Wan2.2. We will soon publish a more detailed prompt guide after more thorough experimentation.

Neon Drift (cyberpunk tracking shot)

Prompt: "A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas. Volumetric pink-blue backlight cuts through steam vents, puddles mirror the glow. Lens flare, shallow depth of field. Moody, Blade-Runner vibe."

A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas. Volumetric pink-blue backlight cuts through steam vents, puddles mirror the glow. Lens flare, shallow depth of field. Moody, Blade-Runner vibe.

Alpine Reveal (pull back)

Prompt: "Extreme close-up of a mountaineer’s ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him. Crisp morning air, golden rim-light, subtle lens flare."

The initial result here seems to have ignored the desired camera movement completely.

Extreme close-up of a mountaineer’s ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him. Crisp morning air, golden rim-light, subtle lens flare.

Aquatic Ballet (slow motion orbit)

Prompt: "An orca breaches in crystal-clear Arctic waters. Slow 360° orbital shot around the soaring whale as droplets hang suspended. Soft polar sunset lights the scene in pastel pinks and blues; cinemagraphic HDR."

Here, we were generating the default 5-second video, and it seems to have been a bit long for the prompt. The slow motion element of the prompt seems to have been respected but the camera orbital was completely ignored.

An orca breaches in crystal-clear Arctic waters. Slow 360° orbital shot around the soaring whale as droplets hang suspended. Soft polar sunset lights the scene in pastel pinks and blues; cinemagraphic HDR.

Camera Movement Comparison with Wan2.1

In a previous post, we tried a number of prompts to get precise control over camera movements with Wan2.1. When we were running those experiments, we had to repeatedly and carefully tweak the prompt to get the effect we wanted. And in the end, some camera movements were very difficult to achieve. With Wan2.2, we went back to the same prompts and tried to see how much had improved.

Pan Left/Right

In our original prompt to achieve panning motion, we found that getting the right direction was extremely difficult (it was basically up to random chance), and Wan2.1 produced discontinuities in camera movement:

A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet.

With Wan2.2, we could successfully control pan direction on the first try:

We changed the pan direction in the prompt to "right".

With Wan2.1, we could not achieve a fast whip pan effect. And the same remains true with 2.2 – although the output from 2.2 is far superior to anything we achieved with 2.1:

We changed the pan effect in the prompt to "whip pan"

Pull back

With Wan2.1, camera pull back worked quite well, and of course, it also does so with 2.2:

Close up shot of the determined face of a battle-worn samurai. Camera pulls back to reveal him standing alone on a foggy battlefield, gripping his katana. Camera pulls back to reveal fallen warriors behind him. Wind whips through the trees, sending red autumn leaves swirling.

Dolly in/out

With Wan2.1, we could successfully and reliably achieve a dolly-in effect, but dolly-out would reliably fail.

This was our best result attempting dolly-out with Wan2.1:

This was our best result for "dolly out" with Wan2.1. It could only dolly-in.

With Wan2.2, we used the same prompt, changing only the camera effect and we got the following results on the first attempt for dolly-in and dolly-out:

In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words "Breaking Bad" written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera hitchcock zooms in. The background shows an abandoned, dim factory with light filtering through the windows. There's a noticeable grainy texture. A medium shot with a straight-on close-up of the character.

Not only does Wan2.2 respect these camera movements, it creates far smoother motions as well.

Tilt

Similar to dolly-out, we had a hard time achieving a tilt effect with Wan2.1:

Wan2.2 once again performs incredibly well.

A close-up shot of the feet of a man wearing mountaineering gear, standing in a grassy field. Camera slowly tilts up, revealing the full body of a mountaineer wearing gear. In the distance, majestic rocky mountains tower above.

Tracking Shot

Similar to Wan2.1, 2.2 also excels at tracking shots:

A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. Pedestrians in futuristic outfits rush by as holographic advertisements flicker in the air. The camera follows a hooded figure in a long tracking shot, weaving through the crowded market. Overhead lights cast a moody glow, while fog drifts through the alleyways. The scene is dark and mysterious, with blue and purple lighting creating a high-tech, dystopian feel.

Crash Zoom

Crash zooms are very handy for a comedic effect. But with Wan 2.1, they were nearly impossible:

Wan2.2 excels where its predecessor failed:

In a large dimly lit midcentury modern room, a man sits with an authoritative and pensive pose on a leather chair. He is wearing a dark suit jacket and grey trousers. He has silver hair. The chair is in the center of the screen. Behind the chair, there is an oak console with a lamp. The wall is made of oak panels. The man looks directly at the camera. Camera rapidly zooms in on the man's face. Then he lets out a slight smirk.

Camera Roll

Camera rolls are used in videography to give a sense of unease, confusion, disorientation or instability.

Wan2.1 made it very difficult to achieve this effect. This was the best we could do after many attempts:

Our best attempt at camera roll with Wan2.1 was after many attempts and still not quite right.

Wan2.2 produced incredible results with the first attempt:

Overhead shot of a man fallen asleep on his desk in front of his computer. The room is dark except for the light from the monitor. The man's head is on his arms by the keyboard. Around the desk, there is a mess of papers and floppy disks. The camera rolls in full 360 motion.

‍

Getting Started in ComfyUI

Wan2.2 is available in a few variations:

Model Type	Model Name	Parameters	Main Function	Model Repository
Hybrid Model	Wan2.2-TI2V-5B	5 B	Hybrid version supporting both text-to-video and image-to-video; a single model meets two core task requirements	Wan2.2-TI2V-5B
Image-to-Video	Wan2.2-I2V-A14B	14 B	Converts static images into dynamic videos while maintaining content consistency and smooth motion	Wan2.2-I2V-A14B
Text-to-Video	Wan2.2-T2V-A14B	14 B	Generates high-quality videos from text descriptions with cinematic-level aesthetic control and precise semantic compliance	Wan2.2-T2V-A14B

The hybrid model is great to run on local setups, if you're limited in GPU power. On InstaSD, we have setup the 14B parameter workflows for text-to-video and image-to-video so you can run them on powerful GPUs right away:

Closing Thoughts

Wan 2.2 goes a long way in bridging the gap between open-source and the best proprietary generators—especially for motion and cinematic polish. Spend time on camera verbs and lighting adjectives; the MoE backbone will take care of the rest.

Happy prompting!

‍

Workflows