Veo 3.1 Guide

Veo 3.1 is Google DeepMind's video generation model released in October 2025. Its most distinctive capability on the platform is native audio-visual co-generation: sound and video are generated simultaneously, making its dialogue lip sync accuracy and ambient sound realism difficult for other models to match. Maximum 8 seconds per generation, 1080p with 4K enhancement available.

Core Capabilities

Native audio generation (dialogue, ambient sound, and background music all in one pass)
Dialogue video: voiceover lip sync without any post-processing
Product showcase video: smooth camera movement, highest visual quality of any video model on the platform
Video extension: continue generating from the end of an existing video clip

Product Showcase Video Prompt

[camera movement] of [detailed product description],
[lighting: soft studio lighting / dramatic backlighting / golden hour],
[background: clean white surface / dark marble],
[optional effect: particle effects / water splash / light refraction],
[audio: elegant orchestral music / ambient city sounds / silence],
[duration: 4 / 6 / 8] seconds.

Dialogue Video Prompt

A [medium / close-up] shot of [scene description].
[Ambient audio: café ambience / city background / quiet office].
[Character A description] says, '[dialogue A]'.
[Character B description] replies, '[dialogue B]'.

Wrap dialogue in single quotes. The shorter each line of dialogue, the more accurate the lip sync. Long complex sentences reduce accuracy noticeably.

Camera Movement Quick Reference

slow push-in — gradual zoom into subject (common for product close-ups)
slow orbit around — 360-degree rotation around subject
macro close-up with shallow depth of field — extreme close-up with blur
low-angle tracking shot — low-angle follow shot
overhead pull-back — overhead angle zooming out
static camera, subject movement — fixed camera, moving subject

Duration Selection Guide

4 seconds → Logo animations, simple product close-ups
6 seconds → Single-scene product showcases
8 seconds → Complete narrative, dialogue video, multi-scene transitions

For content longer than 8 seconds, generate in segments and edit together. Splitting by scene gives more predictable results than one long prompt.

About Pricing

Veo 3.1 uses platform credits charged per second: Fast mode is approximately $0.15/second, Standard mode is approximately $0.40/second. Disabling audio generation reduces cost by about 30%. For dialogue videos, Standard mode gives better lip sync accuracy.

See 5 real Veo 3.1 cases →