Alibaba Wan-AI · 4 unique modes

WAN 2.7 Video Generator

Universal AI video generator with unique R2V multi-reference and VideoEdit AI editing modes. Prompts up to 5000 characters, negative prompt, 720p/1080p.

3-15s
Duration
4 modes
T2V · I2V · R2V · Edit
—–—
Credits per video
720p/1080p
Resolution

Write in your own language

Our AI assistant automatically translates and adapts your prompt for WAN 2.7. Long, detailed descriptions up to 5000 characters work significantly better than short ones.

4 unique modes

Text-to-Video

3/5/10/15s

Generate video from a text description. 5 aspect ratios, negative prompt, seed. You can upload an audio reference — music or a sound, specify its purpose in the prompt.

Image-to-Video

3/5/10/15s

Bring an image to life: upload a photo and describe the motion. First and last frames let you control the animation. Audio reference for voice or music — specify its purpose in the prompt.

R2V — Multi-reference

3/5/10s

Unique mode: up to 3 images + 2 videos as references. Bind them in the prompt via @image1, @video1. Audio reference for voice or music — specify its purpose in the prompt.

VideoEdit — AI editing

2-10s

AI video editing by prompt. Upload a source video (up to 10s), describe the changes. Up to 3 reference images. Audio mode: auto (AI decides) or keep the original audio.

Key features

Prompt up to 5000 chars

The longest prompt of all models — detailed descriptions work much better

Negative Prompt

Exclude unwanted elements: blur, distortion, watermarks — up to 500 chars

R2V multi-reference

Combine up to 5 references (3 photos + 2 videos) for precise style and character control

VideoEdit AI editing

Edit video with text: change style, add elements, alter lighting

5 aspect ratios

16:9, 9:16, 1:1, 4:3, 3:4 — for any format: YouTube, Shorts, Reels, Stories

Photorealism without artifacts

The model is biased toward realism — no need to add "photorealistic" to the prompt

Audio references per mode

WAN 2.7 supports uploading one audio file as a reference. The purpose of the audio (voice, music, background) is set by the prompt:

Text-to-Video

Upload audio as a music reference or background sound for the video. Specify in the prompt how to use the audio. Formats: MP3, WAV, AAC, OGG. Up to 50 MB.

Image-to-Video

Upload audio as a voice, music, or background-sound reference. Specify its purpose in the prompt. Good for talking heads and animations driven by speech/music. Formats: MP3, WAV, AAC, OGG.

R2V — Multi-reference

Voice or music reference — specify its purpose in the prompt. If the video reference contains audio, it is used by default. An uploaded audio file takes priority. WAV/MP3, 1-10s, up to 15 MB.

VideoEdit — Audio mode

Not a file but a toggle. "Auto" — AI decides whether to regenerate audio based on the prompt. "Original" — keeps the source audio from the uploaded video unchanged.

Prompt guide

Text-to-Video and Image-to-Video

Write detailed descriptions — up to 5000 chars. The more detail, the better the result.

Prompt structure: Scene → Action → Camera → Lighting → Atmosphere

A young woman in a white dress walks through a blooming garden at sunset. The camera slowly follows her at shoulder height. Golden light filters through the apple branches, creating warm highlights on her face. A light wind stirs the cherry petals floating in the air. Depth of field — foreground blurred, focus on the heroine.

R2V — Multi-reference

Upload reference images/videos and mention them in the prompt via tags:

  • @image1, @image2, @image3 — for images
  • @video1, @video2 — for video
A girl with the face and hairstyle of @image1 dances in the style of @video1 against the mountain landscape of @image2. The camera smoothly orbits the dancing figure. Sunset lighting.

VideoEdit — AI editing

Upload a source video (up to 10s) and describe the changes. You can add up to 3 reference images.

Replace the daytime lighting with a night scene featuring neon signs. Add rain and wet reflections on the asphalt. Preserve the main character's motion.

Generation examples

Generation cost

Failed to load prices

Specifications

DeveloperAlibaba (Wan-AI)
Resolution720p / 1080p
Duration3 / 5 / 10 / 15s (R2V/VideoEdit: up to 10s)
Aspect ratio16:9, 9:16, 1:1, 4:3, 3:4
Max prompt5000 chars
Negative promptUp to 500 chars
Generation modesText-to-Video, Image-to-Video, R2V, VideoEdit
R2V referencesUp to 3 images + 2 videos
VideoEdit inputVideo up to 10s + up to 3 reference images
Audio referencesT2V/I2V/R2V: voice or music reference (specify in the prompt) · Edit: auto/original

Frequently asked questions

What is WAN 2.7?
WAN 2.7 is a powerful AI video generation model from Alibaba (Wan-AI). 4 unique modes: Text-to-Video, Image-to-Video, R2V (multi-reference), and VideoEdit (AI editing). Prompts up to 5000 characters, negative prompt, 720p/1080p resolution.
What is R2V mode?
R2V (Reference-to-Video) — a unique multi-reference generation mode. Upload up to 3 images and 2 videos, bind them in the prompt via @image1, @image2, @video1, @video2. The model combines style, characters, and motion from the references into a new video.
What is VideoEdit mode?
VideoEdit — AI editing of an existing video via a text prompt. Upload the source video (up to 10 sec), describe the desired changes, and the model will edit it. You can add up to 3 reference images for precise style. Audio mode: auto (AI decides) or keep original audio.
How does 720p differ from 1080p?
720p — standard resolution, faster and cheaper. 1080p — higher detail and sharpness, suitable for final content and production.
How is the cost calculated?
Per-second pricing: cost = price per second × duration. Rate depends on resolution (720p/1080p).
What is a negative prompt?
Negative prompt — описание того, чего НЕ должно быть в видео. Например: "размытие, низкое качество, деформация лиц, водяные знаки". Помогает улучшить результат, исключая нежелательные артефакты. До 500 символов.
Which aspect ratios are available?
5 options: 16:9 (landscape), 9:16 (portrait for Shorts/Reels), 1:1 (square), 4:3 (classic), and 3:4 (portrait). Use first/last frame in I2V for precise control.
How does R2V reference binding work?
В R2V режиме загрузите изображения и видео, затем упоминайте их в промпте: "Девушка с лицом @image1 танцует как в @video1 на фоне пейзажа @image2". Модель объединит визуальные элементы из указанных референсов.

We use cookies to operate the service, keep your session, and collect anonymous statistics. See our Privacy Policy.