Seed Audio 1.0
ByteDance Seed Audio 1.0. Prompt-driven speech and sound-scene generation: describe the dialogue, narration, or ambience and Seed Audio renders expressive audio. Optional steering with a single reference image OR up to three reference-audio clips (never both). Reuse your own cloned voices for a consistent speaker. English and Chinese, up to two minutes per clip, billed on the real output length.
Try Seed Audio 1.0
Created with Seed Audio 1.0
Features
- Prompt-driven scenes
- Image or audio steering
- English and Chinese
- Reuse cloned voices
- Speed, volume and pitch control
- Up to 2 minutes
Specifications
- Languages
- English, Chinese
- Max Length
- 2 minutes
- Steering
- Image or reference audio
- Input
- Prompt + optional image / reference audio
- Output
- MP3 audio
Input Requirements
Related Models
ElevenLabs TTS
100+ voices, natural TTS
MiniMax Speech 2.8 HD
HD expressive text-to-speech
MiniMax Speech 2.8 Turbo
Fast, affordable text-to-speech
ElevenLabs Sound Effects
AI sound effects from text
Stable Audio
AI music generation
ElevenLabs Voice Clone
Clone a voice from one sample
ElevenLabs Translate
AI dubbing to 10 languages
ElevenLabs Audio Isolation
Vocal isolation & denoising
ElevenLabs Voice Convert
Voice-to-voice transform
MiniMax Voice Design
Custom voices from text prompt
Whisper Transcription
Speech-to-text + SRT captions
Frequently Asked Questions
How much does Seed Audio 1.0 cost?
Seed Audio 1.0 costs 1 credits per generation (~$0.03+). You get 10 free credits every day to try it.
Can I use Seed Audio 1.0 outputs commercially?
Yes, all content generated with Seed Audio 1.0 on Arteza comes with a commercial license.
What file format does Seed Audio 1.0 output?
High-quality PNG images at your chosen resolution.