AI Video Reference Image Checklist: How to Get Reusable Results

Omniveo TeamMay 14, 2026

Reference images can make AI video generation dramatically more controllable, but only when they are prepared like production assets. A messy reference set forces the model to guess. A clean reference set tells the model what to preserve, what to animate, and what to ignore.

This checklist is for product marketers, creators, and teams building repeatable image-to-video workflows.

The five-reference rule

Before uploading anything, label each reference with one of five roles:

  1. Identity: the person, character, mascot, or product that must remain recognizable.
  2. Geometry: shape, silhouette, packaging, layout, or room structure.
  3. Material: fabric, glass, metal, skin texture, food surface, or lighting texture.
  4. Environment: location, background, weather, time of day.
  5. Motion: a pose, frame, or previous clip that suggests movement.

If a reference has no role, remove it. More references do not automatically create more control.

Clean input beats clever prompting

Use reference images that are:

  • High resolution enough to show the detail you care about.
  • Not heavily filtered unless the filter is the style target.
  • Free of watermarks, UI overlays, and random text.
  • Cropped around the important subject.
  • Consistent in lighting when identity or product accuracy matters.

If the product label is tiny in the uploaded photo, do not expect the model to preserve it. Upload a clean packshot and tell the model which details matter.

Prompt each reference explicitly

Bad:

Use these references to make a cool fashion video.

Better:

Use reference 1 for the model's face and outfit. Use reference 2 for the studio lighting and gray background. Use reference 3 only for the handbag shape and leather texture. Create an 8-second slow push-in with subtle fabric movement. Do not change the face, outfit color, or handbag proportions.

Preserve successful inputs

The best reference workflow is not only about upload quality. It also needs persistence. When a generation works, save the full setup:

FieldWhy it matters
PromptCaptures the creative instruction.
Model and modeText-to-video and image-to-video behave differently.
Aspect ratioVertical and landscape shots compose differently.
DurationMotion pacing changes with length.
ResolutionAffects finishing quality and credit cost.
Sound settingDetermines whether audio must be directed.
Reference URLsLets the team regenerate or iterate later.
Output URLsKeeps the generated asset available after temporary links expire.

If these inputs are stored, history becomes a production tool instead of a gallery. A teammate can click an old generation, recover the original prompt and references, adjust one variable, and generate a controlled variation.

A repeatable workflow

Use this operating rhythm:

  1. Upload only the references that have a clear role.
  2. Write a prompt that assigns each reference a job.
  3. Generate the first clip at the cheapest acceptable setting.
  4. Fix composition before fixing detail.
  5. Save the working setup before increasing resolution.
  6. Reuse the same references for variants instead of re-uploading different crops.

Common failure modes

FailureLikely causeFix
Face changes between shotsIdentity reference is unclear or mixed with style referencesUse one clean portrait and say "preserve identity."
Product shape changesPrompt asks for motion that deforms the productAdd "keep proportions unchanged" and reduce action.
Scene looks genericEnvironment reference is weakAdd a location reference and describe time of day.
Audio feels randomSound was not directedName ambience, foley, music, and dialogue separately.
Re-run cannot match old resultInputs were not savedStore prompt, settings, references, and output URLs.

Sources

AI Video Reference Image Checklist: How to Get Reusable Results | Omniveo