avatarNEW

Wan 2.2 S2V

Animate a still photo with a speech track using Wan 2.2 Speech-to-Video. The output follows your audio with natural, speech-driven motion at 480p, 580p, or 720p.

3 credits per generation

Try Wan 2.2 S2V

Generating withWan 2.2 S2V3c per generation

Created with Wan 2.2 S2V

Features

Photo + Audio Input
Speech-Driven Motion
480p / 580p / 720p
Audio-Length Output

Specifications

Resolution: 480p / 580p / 720p
Input: Photo + Audio + Prompt
Audio Limit: 7.5s
Output: MP4 Video

Input Requirements

Source Photo*

image upload

Front-facing photo to animate

Audio File*

audio upload

Speech audio to drive the motion (max 7.5s)

Scene Description*

textarea

Resolution(optional)

select

Pricing

from 3 credits

~$0.50-$3.00 per generation

Related Models

OmniHuman v1.5

Photo + Audio to talking avatar

from 2 credits · $0.32-$9.60

Kling Avatar v2

Versatile lip sync for any character

from 2 credits · $0.23-$13.80

SadTalker

Budget avatar from photo + audio

5 credits · $1.00

Sync-3 Lipsync

Video dubbing with 4K lip sync

from 2 credits · $0.27-$16.01

Hunyuan Avatar

Talking and singing, up to 120s

Fabric 1.0

Photo + audio talking avatar

from 1 credits · $0.16/s+

Infini Talk

Audio-driven talking avatar

from 4 credits · $0.40/s+

Frequently Asked Questions

How much does Wan 2.2 S2V cost?

Wan 2.2 S2V costs 3 credits per generation (~$0.50-$3.00). You get 10 free credits every day to try it.

Can I use Wan 2.2 S2V outputs commercially?

Yes, all content generated with Wan 2.2 S2V on Arteza comes with a commercial license.

What file format does Wan 2.2 S2V output?

MP4 video files with lip-synced audio.

More AI Tools

Image Generator Video Generator AI Upscaler Background Remover Inpainting Outpainting Audio Studio Pricing