ByteDance · Cinematic video with audio

Seedance 1.5 Pro

A model from ByteDance for cinematic video generation with native audio. The cheapest per-second price, 6 aspect ratios, First+Last Frame morphing.

4-12s
Duration
Up to 1080p
Resolution
credits
6 formats
9 CinemaScope

Seedance 1.5 Pro specifics

Seedance 1.5 Pro is a model by ByteDance (the creators of TikTok).Generates cinematic video with native audio,with camera control and support for 6 aspect ratios including CinemaScope 21:9.The lowest per-second price on the platform.

💡 Useful tips:

  • Describe subject, action, camera, atmosphere — these 4 elements are required in the prompt
  • For Fixed Lens camera locks in — perfect for portraits and dialogue
  • Start with 480p without audio for cheap tests, then move to 720p/1080p

Key features

6 aspect ratios

1:1, 4:3, 3:4, 16:9, 9:16 and CinemaScope 21:9 — works for any content

21:9 CinemaScope9:16 Stories

Native audio

Sound effects, background music and voices — synced to on-screen motion

OptionalSFX + BGM

Lowest price

From 2 cr/sec — cheaper than any other model on the platform

480p/720p/1080pPay per second

Text → Video

Generate the entire video from a text description — without images

Image → Video

Bring 1 image to life or morph between 2 (First+Last Frame)

Fixed Lens

Static camera for dialogue and portraits. Turn off for dynamic shots

Generation modes

Text-to-Video

Generate video from text — without uploading images, the model creates everything from scratch.

  • • Good for clips, ads, atmospheric scenes
  • • Max prompt — 2500 chars
  • • Works with all 6 formats

Image-to-Video (1 or 2 images)

Animate 1 image or create a smooth transition between 2 frames (First+Last Frame).

  • 1 image — animate the image while keeping its style
  • 2 images — morph from the first frame to the last
  • • JPG, PNG, WebP · max 10 MB

Prompt guide

6 key prompt elements

Seedance uses a scripting approach — write the prompt as a scene description.Order the elements strictly — the model prioritizes by position in the text.

1. Subject (who/what)2. Action3. Facial expression4. Camera motion5. Atmosphere and light6. Sounds (if audio is on)

Prompt example:

"A street musician sits by a brick wall in daylight and plays an acoustic guitar. The camera pans slowly left to right, moving from his hands on the strings to his face. A passerby drops a coin into the open case. Sounds: rhythmic acoustic guitar, light city ambience, footsteps on the sidewalk. Soft, natural light, documentary aesthetic."

Camera control

Describe camera motion in plain words: pan, tilt, zoom, rotation. Specify direction, speed, and trajectory.

Motion examples:
  • • "Smooth pan left to right"
  • • "Slow tilt up to the sky"
  • • "Gradual zoom in to the face"
  • • "Camera orbits the object 180°"
Advanced effects:
  • Dolly zoom — camera pulls back, subject stays the same size
  • Tracking shot — camera follows the subject
  • Orbital shot — camera arcs around

Audio: avoid conflicts

The model links visual motion to sound. Contradictions yield unpredictable results.

❌ Bad:

"A quiet thunderstorm with strong wind and bright lightning" — 3 conflicting signals

✅ Good:

"A distant thunderstorm with quiet thunder and a light wind" — one dominant tone

What NOT to do

  • Conflicting sound — "quiet" + "loud explosion" in the same prompt
  • Sudden lighting change — from a bright café straight to a night street
  • Too many details for 4 seconds — the model needs time for an action

💡 Our prompt enhancement system helps assemble the description in the right format

What works well

  • Cinematic scenes with camera movement (pans, tracking, orbital)
  • Atmospheric landscapes and documentary aesthetics
  • Clips and ad teasers — dynamic short scenes
  • First+Last Frame — transformations, morphing, before/after
  • Photo animation (image-to-video from 1 image)
  • Sound effects synced to the action (wind, footsteps, music)

For Image-to-Video: describe only the motion

The model already SEES everything in the image. Describe only what should to change.When uploading 2 images, describe how the transition between them happens.

✅ WHAT TO DESCRIBE:

Actions, camera motion, physics (wind, light), sounds, transitions between frames

❌ DON'T DESCRIBE:

Appearance, clothing, hair color, background details — the model takes them from the photo

Pricing

Price = price per second × duration. Depends on resolution and whether audio is enabled.

Failed to load prices

Video examples with prompts

Study the prompts for high-quality video — these examples help you grasp the structure of effective descriptions

Sixio magic

Premium photorealistic fantasy realism. Medium shot: Subject raises hand, draws semicircle with blue magical glow and sparks forming glowing "Sixio" inscription. Cut to close-up: soft excited smile, gaze forward, inscription blurred in background. Audio: winter wind, magical bells, soft female voice "Sixio, can you hear me?" with English lip-sync, calm orchestral New Year melody. Volumetric lighting, high detail.

fantasylip-syncVFXmusic

Talking cat in the Forbidden City

Beneath the red walls of the Forbidden City on a beautiful day, a chubby orange cat wearing sunglasses lounges arrogantly on a traditional armchair like a boss. It lazily speaks in a Beijing accent: "得嘞,今儿个天儿不错,来晒晒太阳".

Talking characterLip syncPortrait

Invitation to dance

Elegant woman in flowing evening gown walks away, turns head, reaches out invitingly and says "Shall we dance?" Elegant pose, luxury hotel corridor, natural body proportions, refined silhouette, soft golden hour lighting, fashion editorial style, shot from behind at 45 degree angle, cinematic composition, fashion magazine aesthetic.

Talking characterFashion shootGolden Hour

Mech takeoff

An internal light source from the mech's chest shines outward as the mech passes through the hangar's light curtain and ascends directly into the sky, achieving a smooth transition from mechanical interior space to open air.

sci-fidynamiclighting

Silk dress

Soft light pours down, silk shimmering with a pearlescent sheen. She gently walks toward the camera, the skirt draping and flowing, outlining a form that is both soft and luxurious.

fashiontexturelighting

Steampunk technician

On an aerial corridor in a steampunk city, a mechanical technician stands on a suspended bridge, rapidly communicating with an assistant via radio in Cantonese: "气压表又掉了,帮我开三号阀!" The camera looks up from beneath the bridge, emphasizing the dizzying height. The massive sounds of gears and pistons blend with the wind, forming a complex environmental soundscape.

steampunktalking charactersound design

How to use

1

Choose a mode

Text→Video for generation from scratch. Image→Video — upload 1 image (animation) or 2 (First+Last Frame morphing).

2

Configure parameters

Pick a format (16:9, 9:16, 21:9...), resolution, duration, and audio. Fixed Lens — for a static camera.

3

Describe the video in Russian

Enter a description in any language (except terms like Dolly zoom, Tracking shot). AI will enhance the prompt with the proper element structure.

4

Translate and generate

Press 'Translate to English' — AI will translate the prompt for free. Then press 'Generate'. Result in 2-5 minutes.

Frequently asked questions

Seedance 1.5 Pro — what is this model?

Seedance 1.5 Pro is a model from ByteDance (creators of TikTok) for generating cinematic video with synchronized audio. Supports text→video and image→video (including First+Last Frame).

Which resolutions are available?

480p (standard), 720p (high), and 1080p (Full HD). Pick 480p for drafts and tests — it's the cheapest option.

How does audio generation work?

The model automatically generates sound effects, background music, and voices synchronized to on-screen motion. Audio is optional, enabled separately, and affects the price.

Why use the Fixed Lens option?

Fixed Lens locks the camera — the frame stays static, only the subject moves. Disable for dynamic flythroughs and pans. Useful for dialogue and talking heads.

Can I upload 2 images?

Yes! 1 image — classic image-to-video. 2 images — First+Last Frame mode: the model creates a smooth transition between the start and end frames.

Which aspect ratios are supported?

Seedance supports 6 aspect ratios: 1:1, 21:9 (CinemaScope), 4:3, 3:4, 16:9 (landscape), and 9:16 (portrait). The widest choice among our models.

How is the cost calculated?

Cost = price per second × duration. Per-second price depends on resolution (480p/720p/1080p) and audio. The cheapest option is 480p without audio.

When should I pick Seedance over other models?

Seedance is the best choice for: budget clips and ads, animating photos (image-to-video), morphing between two frames (First+Last), video with native audio, and content needing many aspect ratios (including 21:9 CinemaScope).

Ready to try Seedance 1.5 Pro?

Cinematic video with native audio, 6 aspect ratios, First+Last Frame — from 2 cr/sec

© 2026 Sixio. All rights reserved.

We use cookies to operate the service, keep your session, and collect anonymous statistics. See our Privacy Policy.