ByteDance · 2K 60FPS with native stereo audio

Seedance 2.0

Multimodal model by ByteDance for cinematic video generation with native stereo audio. #1 on Artificial Analysis for I2V+Audio quality. Up to 15 seconds, 2K@60FPS, multi-reference from 9 photos + 3 videos + 3 audios.

4-15s
Duration
2K 60FPS
Resolution
credits
Stereo
SFX + BGM + dialogue

Video examples

Storyboard mode

Storyboard mode: you describe an idea, an AI director breaks it into shots, an image model (Nano Banana 2 / GPT Image 2) draws a single grid image with all the frames, and Seedance 2.0 animates it into a coherent multi-shot scene. You can upload subject references (e.g., photos of the planes) so the model knows what they look like.

Storyboard image (frame grid)
1. Storyboard image (frame grid)
2. Video animated from the storyboard
1
Describe an idea
An AI director breaks the scene into 4-9 shots and writes the prompts.
2
Storyboard image
The model draws one grid of frames (using your references). Saved to the gallery.
3
Animation
Seedance 2.0 animates the panels in order into one coherent scene with audio.

Key features

Native 2K 60FPS

Generation in up to 2K resolution at 60 frames per second — cinematic quality without upscaling

Up to 15 seconds with multi-shot

Duration from 4 to 15 seconds in 1-second increments. Multi-shot scene support

Stereo audio: SFX + BGM + dialogue

Native stereo audio with sound effects, background music, and synchronized dialogue

Multi-reference

Up to 9 photos + 3 videos + 3 audios — the model extracts style, motion, and sound from all sources

Standard + Fast modes

Standard — maximum quality up to 1080p. Fast — faster and cheaper, up to 720p

360° identity preservation

Enhanced character preservation during full 360-degree camera rotation

Pricing

Cost = rate per second × duration. The rate depends on mode (Standard/Fast), resolution, and whether a video reference is used.

Failed to load prices

How to use

1

Choose mode and parameters

Standard for maximum quality (up to 1080p) or Fast for speed (up to 720p). Pick resolution and duration from 4 to 15 seconds.

2

Upload references (optional)

Upload up to 9 images for style/characters, up to 3 videos for motion, up to 3 audios for sync. Or use text only.

3

Describe the video

Enter a description in any language. Use AI prompt enhancement for optimal structure. Then translate to English.

4

Generate

Press 'Generate' and get your result. Video with native stereo audio is generated in 2-5 minutes.

Frequently asked questions

What is Seedance 2.0?

Seedance 2.0 is a multimodal model by ByteDance for generating cinematic video with native stereo audio. #1 on Artificial Analysis for I2V+Audio quality. Supports up to 15 seconds at 2K@60FPS resolution.

How is Seedance 2.0 different from 1.5 Pro?

Seedance 2.0 is a completely new architecture: 2K@60FPS (instead of 1080p), up to 15 seconds (instead of 12), multi-reference (up to 9 photos + 3 videos + 3 audios), enhanced identity preservation at 360° rotations, text overlays, two modes Standard/Fast.

What resolutions and durations are available?

Standard: 480p, 720p, 1080p. Fast: 480p, 720p. Duration: from 4 to 15 seconds in 1-second increments.

How does multi-reference work?

Upload up to 9 images for style/characters, up to 3 videos for copying motion/camera, up to 3 audios for synchronization. The model automatically extracts key features and combines them with the text prompt.

Standard vs Fast — what's the difference?

Standard generates at maximum quality 2K@60FPS with 1080p support. Fast is faster and cheaper but limited to 720p. Both support all input modes and native audio.

How is the cost calculated?

Price = rate per second × duration. The rate depends on mode (Standard/Fast), resolution, and whether a video reference is used. With a video reference the rate is lower, but calculated from the sum of input and output video durations.

Which languages does lip-sync support?

English, Chinese (Mandarin + dialects), Japanese, Korean, Spanish, Indonesian, Portuguese, and more. Multi-character dialogue with unique voices and accurate lip-sync.

When should I choose Seedance 2.0?

Seedance 2.0 is the best choice for: professional ads, cinematic content, video with native audio and lip-sync, complex scenes with character preservation, multi-reference generation from multiple sources.

Try Seedance 2.0

2K@60FPS, stereo audio, multi-reference, up to 15 seconds — from — credits

© 2026 Sixio. All rights reserved.

We use cookies to operate the service, keep your session, and collect anonymous statistics. See our Privacy Policy.