These are tested Veo prompts for image-to-video — animating a still photo, portrait, product shot, or artwork. The single most important rule: the image already defines the subject, setting, and style, so your prompt should describe only the motion, the camera, and the audio — never re-describe the scene. Copy a prompt, attach your image in Veo, and change the bold details. This cluster is part of the wider Veo Prompt Library, which also covers product, UGC, and dialogue prompts.
The image-to-video mindset
In text-to-video you build the whole world with words. In image-to-video, the world already exists in the picture — your job is to add life to it. Spend the prompt on three things: what moves (and how forcefully), how the camera behaves, and what the scene sounds like. If you start re-describing what is already visible, you give Veo room to reinterpret and drift away from your image.
How to write Veo I2V prompts
- Describe motion, not the scene. “Slow push-in, the hair sways in a light breeze” — not a fresh description of the person.
- Use weighted verbs. Sway, ripple, drift, push, pull. Vague actions make the clip float; force verbs give motion weight.
- Name the camera move. Veo defaults to nearly static. State the dolly, orbit, crane, or push-in explicitly.
- Keep it small for realism. One or two clear actions beat a busy performance, especially for faces and products.
- Layer audio.
Ambient noise: for the bed, SFX: for specific sounds, says, "..." to make a pictured person speak. Add no background music to protect dialogue.
- For start/end control, supply two frames and describe the transition; for character consistency across clips, use the reference-image (“ingredients”) feature.
Want to structure the motion and camera cues without starting from a blank prompt? Open the Veo Prompt Builder — the cinematic preset’s camera-move and lighting fields work well as a starting point for I2V motion, even though there is no dedicated I2V preset yet. For a photo of a person who needs to talk, pair it with the dialogue and audio prompts; for a product photo, see the product video prompts for the hero-shot language to reuse.
Image-to-video needs a model that accepts an input image plus a prompt. If you do not have direct Veo access, Pollo AI supports image-to-video across Veo and other models in one place, so you can attach your image, run this prompt, and re-roll the motion. Disclosure: affiliate link — we may earn a commission if you subscribe, at no extra cost to you. We only recommend tools we would use ourselves.
FAQ
How is an image-to-video prompt different from a text-to-video prompt?
With image-to-video, the picture already defines the subject, setting, composition, and style. Your prompt only needs to describe what moves, how the camera behaves, and what audio plays. Re-describing the scene wastes the prompt and can fight the image.
Why does my image drift or morph into something else?
Usually too much motion or a prompt that re-describes the scene. Keep actions minimal and weighted, do not restate what the image shows, and add "keep the subject consistent" for faces and products.
Can I use a first and last frame in Veo?
Yes — Veo 3.1 supports start-and-end-frame interpolation. Provide both images and describe the transition between them rather than each frame. It gives the most control over how a clip begins and ends.
Does image-to-video generate audio?
Veo 3.1 image-to-video supports native audio. Note that the separate add/remove-object image-edit path runs on Veo 2 and does not generate audio, so check which feature you are using.
How do I keep a character consistent across shots?
Use the reference-image ("ingredients") feature: supply images of the character, object, or style and Veo carries that look across multiple clips, now with audio.