avatarNEW
Wan 2.2 S2V
Animate a still photo with a speech track using Wan 2.2 Speech-to-Video. The output follows your audio with natural, speech-driven motion at 480p, 580p, or 720p.
Try Wan 2.2 S2V
Generating withWan 2.2 S2V3c per generation
Created with Wan 2.2 S2V
Features
- Photo + Audio Input
- Speech-Driven Motion
- 480p / 580p / 720p
- Audio-Length Output
Specifications
- Resolution
- 480p / 580p / 720p
- Input
- Photo + Audio + Prompt
- Audio Limit
- 7.5s
- Output
- MP4 Video
Input Requirements
Source Photo*
image upload
Front-facing photo to animate
Audio File*
audio upload
Speech audio to drive the motion (max 7.5s)
Scene Description*
textarea
Resolution(optional)
select
Related Models
OmniHuman v1.5
Photo + Audio to talking avatar
from 2 credits · $0.32-$9.60
Kling Avatar v2
Versatile lip sync for any character
from 2 credits · $0.23-$13.80
SadTalker
Budget avatar from photo + audio
5 credits · $1.00
Sync-3 Lipsync
Video dubbing with 4K lip sync
from 2 credits · $0.27-$16.01
Hunyuan Avatar
Talking and singing, up to 120s
Fabric 1.0
Photo + audio talking avatar
from 1 credits · $0.16/s+
Infini Talk
Audio-driven talking avatar
from 4 credits · $0.40/s+
Frequently Asked Questions
How much does Wan 2.2 S2V cost?
Wan 2.2 S2V costs 3 credits per generation (~$0.50-$3.00). You get 10 free credits every day to try it.
Can I use Wan 2.2 S2V outputs commercially?
Yes, all content generated with Wan 2.2 S2V on Arteza comes with a commercial license.
What file format does Wan 2.2 S2V output?
MP4 video files with lip-synced audio.