ByteDance · 2K 60FPS with native stereo audio

Seedance 2.0

Multimodal model by ByteDance for cinematic video generation with native stereo audio. #1 on Artificial Analysis for I2V+Audio quality. Up to 15 seconds, 2K@60FPS, multi-reference from 9 photos + 3 videos + 3 audios.

4-15s
Duration
2K 60FPS
Resolution
credits
Stereo
SFX + BGM + dialogue

Video examples

Key features

Native 2K 60FPS

Generation in up to 2K resolution at 60 frames per second — cinematic quality without upscaling

Up to 15 seconds with multi-shot

Duration from 4 to 15 seconds in 1-second increments. Multi-shot scene support

Stereo audio: SFX + BGM + dialogue

Native stereo audio with sound effects, background music, and synchronized dialogue

Multi-reference

Up to 9 photos + 3 videos + 3 audios — the model extracts style, motion, and sound from all sources

Standard + Fast modes

Standard — maximum quality up to 1080p. Fast — faster and cheaper, up to 720p

360° identity preservation

Enhanced character preservation during full 360-degree camera rotation

Pricing

Cost = rate per second × duration. The rate depends on mode (Standard/Fast), resolution, and whether a video reference is used.

Failed to load prices

How to use

1

Choose mode and parameters

Standard for maximum quality (up to 1080p) or Fast for speed (up to 720p). Pick resolution and duration from 4 to 15 seconds.

2

Upload references (optional)

Upload up to 9 images for style/characters, up to 3 videos for motion, up to 3 audios for sync. Or use text only.

3

Describe the video

Enter a description in any language. Use AI prompt enhancement for optimal structure. Then translate to English.

4

Generate

Press 'Generate' and get your result. Video with native stereo audio is generated in 2-5 minutes.

Frequently asked questions

What is Seedance 2.0?

Seedance 2.0 is a multimodal model by ByteDance for generating cinematic video with native stereo audio. #1 on Artificial Analysis for I2V+Audio quality. Supports up to 15 seconds at 2K@60FPS resolution.

How is Seedance 2.0 different from 1.5 Pro?

Seedance 2.0 is a completely new architecture: 2K@60FPS (instead of 1080p), up to 15 seconds (instead of 12), multi-reference (up to 9 photos + 3 videos + 3 audios), enhanced identity preservation at 360° rotations, text overlays, two modes Standard/Fast.

What resolutions and durations are available?

Standard: 480p, 720p, 1080p. Fast: 480p, 720p. Duration: from 4 to 15 seconds in 1-second increments.

How does multi-reference work?

Upload up to 9 images for style/characters, up to 3 videos for copying motion/camera, up to 3 audios for synchronization. The model automatically extracts key features and combines them with the text prompt.

Standard vs Fast — what's the difference?

Standard generates at maximum quality 2K@60FPS with 1080p support. Fast is faster and cheaper but limited to 720p. Both support all input modes and native audio.

How is the cost calculated?

Price = rate per second × duration. The rate depends on mode (Standard/Fast), resolution, and whether a video reference is used. With a video reference the rate is lower, but calculated from the sum of input and output video durations.

Which languages does lip-sync support?

English, Chinese (Mandarin + dialects), Japanese, Korean, Spanish, Indonesian, Portuguese, and more. Multi-character dialogue with unique voices and accurate lip-sync.

When should I choose Seedance 2.0?

Seedance 2.0 is the best choice for: professional ads, cinematic content, video with native audio and lip-sync, complex scenes with character preservation, multi-reference generation from multiple sources.

Try Seedance 2.0

2K@60FPS, stereo audio, multi-reference, up to 15 seconds — from — credits

© 2026 Sixio. All rights reserved.

We use cookies to operate the service, keep your session, and collect anonymous statistics. See our Privacy Policy.