Native 4K, up to 7 reference units (images + video), native audio with lip-sync. 4-10 second generation without subscription.
4K quality for advertising and marketing videos with native audio, lip-sync, and cinematic detail.
Up to 7 product reference photos as input — the model creates a consistent demo video from different angles.
Native 4K, camera and lighting control, atmospheric sound — for short films and branded content.
Gemini Omni uses a unit system for input references. Limit — 7 units per generation. Combine images and video however you like within the quota.
Upload up to 7 reference photos — different angles, characters, or storyboard. The model will combine the context of all images.
Video reference up to 100 MB and 30 seconds. The model preserves camera and dynamics — you can change style, characters, environment.
For example: 1 video (2 units) + 5 photos (5 units) = 7. Or just a text prompt without references — that works too.
| Resolution | 4s | 6s | 8s | 10s |
|---|---|---|---|---|
| 720p / 1080p | 72 | 96 | 120 | 144 |
| 4K | 168 | 192 | 216 | 240 |
Price in credits
Fixed price
Fixed price
Minimum price — from 72 credits (4s, 720p/1080p). Pay only for results.
Gemini Omni works great with structured prompts. The 5-component formula covers all aspects of video. Use the special format for audio.
SFX:Sound effects. Example: "SFX: glass shattering, metallic clang".Ambient noise:Background ambient sounds. Example: "Ambient noise: city traffic, distant sirens, wind"."dialogue"Dialogue in quotes. Example: "We're almost there" — male voice, calm tone. Lip-sync works in many languages.For 4K, add specific texture details: "visible fabric weave", "water droplets on glass", "individual hair strands". The model reveals detail through named textures.
When uploading multiple photos, describe in the prompt how they are related: "transition from photo 1 to photo 3" or "all angles of one object". The model builds a better narrative with context.
Specific parameters: lens (85mm portrait, 24mm wide), shot type (close-up, wide), motion (slow orbit, tracking). Use measured motion for stability.
Combine all three audio types: SFX for spot sounds, Ambient noise for atmosphere, dialogue in quotes for speech. Lip-sync in the original language — do NOT translate.
Gemini Omni is a multimodal flagship video generation model from Google (2026). It supports native 4K resolution, a reference system (up to 7 units: images and/or video), native audio with lip-sync. Duration from 4 to 10 seconds.
Gemini Omni is the next generation after VEO 3.1. Key differences: native 4K resolution (VEO 3.1 natively renders 1080p with optional upscaling to 4K), reference system with up to 7 units combining images and video (VEO 3.1 — one image), native audio with lip-sync in a single pass. Higher detail, better temporal stability, and motion physics.
Gemini Omni uses a unit system: each image = 1 unit, video = 2 units, limit — 7 units. You can upload up to 7 images, or 1 video + up to 5 images, or just a text prompt without references. The model will combine the context of all uploaded materials.
Native 4K means the model generates video at 3840×2160 resolution directly, rather than upscaling from a lower resolution. This provides significantly higher texture detail, sharpness of fine elements, and no scaling artifacts.
Without video reference: 720p/1080p — from 72 to 144 credits, 4K — from 168 to 240 credits. With video reference (fixed): 192 cr (720p/1080p), 288 cr (4K).
Video reference takes 2 units out of 7. Upload a video (up to 100 MB, up to 30 seconds) and describe the changes: style transfer, character replacement, environment change. Camera and dynamics of the source video are preserved. You can add up to 5 images alongside the video reference. Fixed price regardless of duration.
Yes, Gemini Omni generates native audio in a single pass with the video. Sound effects (SFX:), ambient sounds (Ambient noise:), and dialogue (text in quotes with voice specification) are supported. Lip-sync works in many languages — no need to translate dialogue to English.
We use cookies to operate the service, keep your session, and collect anonymous statistics. See our Privacy Policy.