Veo 3.1 AI Video Generator

Text, Image, and Reference-Guided Video

Use Veo 3.1 for controlled cinematic clips with text prompts, first/last-frame guidance, reference images, vertical or horizontal framing, and Lite/Fast/Quality choices.

Veo 3.1

Prompt *

0 / 10,000

Mode

Reference Images (1-2 first/last frame images)

Drop images here or browse

JPG, PNG, WEBP

Aspect Ratio

Quality

Seed (optional)

Watermark (optional)

Auto-Translate Prompt

Translate non-English prompts before generation

Output

Ready to generate video

Enter a prompt and click Generate

Veo 3.1 model details

A deeper guide to Veo 3.1 input modes, quality choices, reference control, vertical video, and production settings in GemiOmni.

Veo 3.1 is for more controlled video work: text prompts, image-to-video, reference-to-video, vertical or horizontal framing, and quality modes. Use it when the clip needs a tighter relationship between prompt, reference images, camera movement, and final placement.

Text, image, referenceLite/Fast/Quality16:9 and 9:16Seed control
Inputs
01

What do Text, Image, and Reference modes mean?

Text mode creates a scene from the prompt alone. Image mode uses first and last frame guidance for stronger visual control. Reference mode can use multiple images to guide subject, style, or composition when consistency matters more than pure exploration.

  • Text-to-video for new scenes
  • Image-to-video for frame control
  • Reference-to-video for consistency
  • Up to 3 reference images
Quality
02

How should I choose Lite, Fast, or Quality?

Use Lite or Fast when you want faster testing and prompt iteration. Use Quality when the shot needs better motion, light, detail, or production value. Reference-to-video currently uses Fast mode, so switch modes deliberately when you move between reference control and final-quality testing.

  • Lite/Fast for iteration
  • Quality for final candidates
  • Reference mode uses Fast
  • Test prompts before final runs
Specs
03

Which output settings does Veo 3.1 support?

Veo 3.1 in GemiOmni supports 16:9 and 9:16 output, Auto aspect where the selected image mode allows it, plus seed and watermark controls. It is especially useful when a single scene needs controlled framing for social video, ads, and cinematic placements.

  • 16:9 and 9:16
  • Auto for supported image modes
  • Seed support
  • Watermark control
Comparison
04

Veo 3.1 vs Kling or Wan: when should I switch?

Choose Veo 3.1 for controlled cinematic scenes and reference-driven output. Try Kling for dynamic physical motion, Wan for stylized or flexible video prompts, Seedance for people and movement, and Hailuo for quick visual exploration.

  • Veo 3.1 for control
  • Kling for motion
  • Wan for style
  • Seedance for people and action
Prompting
05

What prompt structure works best for Veo 3.1?

Use a compact production brief: subject, action, camera movement, lighting, location, mood, and intended platform. If you upload references, explain whether each image controls identity, style, background, or first/last frame.

  • Subject plus action
  • Camera and lighting
  • Reference roles
  • Platform-specific framing

Veo 3.1 Features

Controls exposed in the GemiOmni workspace.

Three Input Modes

Start from text, guide motion with first/last frames, or use reference images for stronger consistency.

Lite, Fast, Quality

Pick a cost and quality path based on whether you are testing prompts or preparing a final candidate.

Aspect and Seed Control

Use 16:9, 9:16, Auto where supported, seed control, and watermark settings when the mode allows it.

How to Use

3 simple steps.

1

Choose a Mode

Use text-to-video, image-to-video, or reference-to-video depending on how much visual control you need.

2

Configure

Choose Lite, Fast, or Quality, then set aspect ratio, seed, watermark, and references where supported.

3

Generate

Click Generate and download when the task completes.

FAQ

Common questions.