Veo Dialogue & Audio Prompts

These are tested Veo prompts for dialogue, lip-sync, sound effects, and ambient audio — the part most AI video models still get wrong. To make a character speak, write the line in quotation marks after a speech verb and a comma: A woman says, "We have to leave now." That comma-and-quotes pattern is what triggers Veo’s lip-sync; without it, the words often appear as on-screen text instead. Copy any prompt below, paste it into Veo, and change the bold details. This cluster is one part of the full Veo Prompt Library, covering product, UGC, and image-to-video prompts as well.

Why Veo’s audio is different

Veo generates picture and sound together in one pass, not as a soundtrack bolted on afterward. That joint process is why lip movements line up with the words and why ambient sounds match the scene. It also means you should prompt audio as its own layer of the scene — name the dialogue, the sound effects, and the ambience explicitly — rather than tacking “with sound” onto the end of a visual prompt and hoping.

How to write Veo audio prompts

Ready to script a scene? Build it in the Veo Prompt Builder — the dialogue preset starts from these same says-quote and no-subtitles defaults. If the delivery is more casual than scripted, the UGC-style prompts show the handheld, selfie-vlog version of a spoken line, and the product video prompts show how to add a short spoken hook to an ad.

Got your prompt? Run it on a model with native audio

These prompts need a Veo-capable runner. If you do not have direct Veo access, Pollo AI lets you run Veo and other video models in one place, so you can test a dialogue prompt and re-roll without juggling accounts. Disclosure: this is an affiliate link — we may earn a commission if you subscribe, at no extra cost to you. We only suggest tools we would use to run these prompts.

Prompt deck

Copy a format, check the evidence, then customize it.

20 prompts 10 evidenced 10 community 0 owner-tested

Veo / Dialogue / talking / lip-sync

Historical figure explains a concept to camera (Pythagoras)

Why it works Naming a well-known figure and a concept gives Veo enough context to generate period-appropriate speech, setting, and delivery without you writing a script — proof native audio can carry an entire explanatory monologue from a one-line brief.

Prompt
Pythagoras explaining his theorem, in ancient Greece

TweakA minimal prompt that still yields a full spoken explanation with period-accurate delivery. Swap the historical figure and the concept — Veo writes the explanation itself, so add a quoted line only if you need exact wording.

Credit@skirano, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Two-person conversation (lip-synced dialogue)

Prompt
Medium two-shot in a dim 1940s detective office, rain streaking the window. A **weary middle-aged detective in a rumpled grey suit** sits behind the desk; a **young woman in a red coat** stands in the doorway. The detective looks up and says in a tired, gravelly voice, "Of all the offices in this town, you had to walk into mine." Static camera, soft low-key lamp light, shallow depth of field, film-noir grade. Ambient noise: faint rain on glass, a ticking clock. No background music.

TweakChange the bold characters and the quoted line; keep the line under ~12 words so it fits an 8-second clip.

Veo / Dialogue / talking / lip-sync

Single character talking to camera (clean lip-sync)

Prompt
Close-up of a **friendly barista in her late 20s with curly hair**, standing behind a coffee counter, looking directly into the lens. She smiles and says in a warm, upbeat voice, "Pull up a stool — this one's on the house." Static handheld feel, soft window daylight, photoreal, 4K. No background music, no on-screen text, no subtitles.

TweakSwap the bold character and the quoted line. Keep "no on-screen text, no subtitles" to stop Veo rendering the words as captions.

Veo / Dialogue / talking / lip-sync

Scripted rap-battle dialogue between two characters (dual accents, lip-sync)

Why it works Naming a distinct accent and subject per speaker ("British accent about gravity", "German accent about relativity") gives Veo two clearly separable voice targets, which is why the lip-sync and the back-and-forth timing hold up even with a long, lyrical script.

Prompt
A high-energy rap battle between Isaac Newton and Albert Einstein on a futuristic sci-fi stage. The camera alternates between close-ups and dramatic wide shots as they diss each other with sharp lyrics. Newton, in a classic 17th-century outfit, raps with a British accent about gravity and apples. Einstein, with wild hair and a German accent, fires back about relativity and space-time. Their lip-sync is perfectly timed to the beat, and their facial expressions are intense and animated. The background pulses with neon lights and holographic equations, reacting to the rhythm. The crowd of AI-generated scientists cheers them on in sync with the music. It feels like a rap battle from another dimension.

TweakThe most demanding dialogue test in this library: two accented characters, rhymed lines, and crowd reaction all lip-synced to a beat. Swap the two characters and their signature topics; keep one accent cue per character so the voices stay distinct.

Credit@ZHO_ZHO_ZHO, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Musical performance with full vocal delivery (opera singer)

Why it works A minimal setup ("an opera singer singing on stage") gives Veo room to generate a full musical performance with confident timing, showing that native audio extends to sustained singing, not only short spoken lines — useful evidence for the singing/musical prompt above, which is otherwise untested.

Prompt
an opera singer singing on stage.

TweakProof that a bare-bones prompt can still produce a sustained sung vocal performance, not just a spoken line. Add the venue, costume, or a specific aria style for more control; expect more re-rolls than spoken dialogue since singing is less consistent.

Credit@jerrod_lew, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Narrated voice-over with in-scene action (streamer commentary)

Why it works Framing the scene as an achievement moment ("getting a victory royale") cues Veo to generate reactive, in-character commentary rather than a flat description — a reusable pattern for any UGC-style voice-over-over-action clip.

Prompt
Streamer getting a victory royale with just his pickaxe

TweakA short scene-plus-commentary brief — Veo generates the streamer's own excited narration synced to the action. Swap the game/activity and the tool; keep the format "[person] doing [feat] with [object]" to trigger commentary-style delivery.

Credit@mattshumer_, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Observational narration over a scene (mockumentary voice)

Why it works Describing two roles and a camera pan ("pans over to...taking notes") in one sentence gives Veo both a speaker and an audience reaction to render, which is why this format reliably produces layered classroom dialogue instead of one flat voice.

Prompt
A college professor doing a class on Gen Z slang and the video pans over to all the boomers taking notes and seeming super interested

TweakA single-sentence setup that yields both a lecturing voice and reactive classroom dialogue. Swap the subject being taught and the reacting group for other mockumentary-style scenes.

Credit@HonestBlogging, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Emotional delivery (voice direction before the line)

Prompt
Medium close-up of an **astronaut in a worn flight suit** inside a cramped capsule, soft instrument glow on her face. She stares out the porthole and says, in a hushed, awestruck whisper, "I never thought I'd actually see it." Slow push-in, cinematic low-key lighting, photoreal. Ambient noise: the quiet hum of the capsule. No background music.

TweakChange the emotion cue ("hushed, awestruck whisper") to "trembling, panicked" or "flat, exhausted" to reshape the vocal performance.

Veo / Dialogue / talking / lip-sync

Mouth-sound SFX driving the visuals (voice-artist sound design)

Why it works Specifying who or what makes each sound — rather than just naming the sound — gives Veo a consistent audio character to render across many quick cuts, which is why the sound-to-visual sync holds together over a fast-paced, multi-scene sequence instead of drifting.

Prompt
A dynamic camera glides through a miniature LEGO world, where an epic adventure unfolds. All sound effects—footsteps, explosions, cars, dragons—are created using mouth sounds by a single AI-generated voice artist. As each sound is made, the visuals instantly respond: LEGO characters jump into action, cars race, spaceships take off, volcanoes erupt. The journey moves through LEGO-built environments—city streets, underwater ruins, space stations, and lava lairs. The video is fast-paced, playful, and visually rich, like a blend between The LEGO Movie and next-gen AI storytelling. The sound-to-visual sync creates a magical, toy-driven universe where imagination controls reality.

TweakAn advanced SFX-control pattern: naming the sound *source* ("mouth sounds by a single AI-generated voice artist") instead of describing the sounds directly. Swap the world/theme and the list of sound-triggered actions; keep the sound-drives-visual framing.

Credit@ZHO_ZHO_ZHO, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Sung performance with crowd reaction (character + setting + song topic)

Why it works Giving the song a specific, absurd topic ("how things used to be in the Mesozoic Era") rather than "singing a song" gives Veo actual content to perform, and scripting the crowd's reaction closes the loop so the clip reads as a complete bit rather than a performance that just stops.

Prompt
A dinosaur with a white fedora and a Hawaiian shirt playing an acoustic guitar on stage at a small waterside bar in Puerto Rico. The dinosaur is singing about how thing used to be in the Mesozoic Era. People are clapping and laughing.

TweakA named performer, a specific song topic, and a scripted crowd reaction in one prompt. Swap the character, the venue, and the song subject; keep the closing crowd-reaction line — it gives Veo a payoff to land the performance on.

Credit@CitizenPlain, via jax-explorer/awesome-veo3-videos

Veo / Dialogue / talking / lip-sync

Sound-effects-led scene (no dialogue)

Prompt
A **blacksmith in a leather apron** hammers a glowing orange blade on an anvil in a dark forge. Sparks fly with each strike. Slow tracking shot around the anvil, warm firelight, photoreal, slow motion on the sparks. SFX: the sharp clang of hammer on steel, the hiss of hot metal, crackling embers. Ambient noise: low roar of the forge fire. No background music, no dialogue.

TweakReplace the bold subject and the SFX list. Name each sound precisely — "sharp clang", "hiss" — rather than writing "blacksmith sounds".

Veo / Dialogue / talking / lip-sync

Ambient soundscape (mood without speech)

Prompt
Wide establishing shot of a **rain-soaked Tokyo alley at night**, neon signs reflected in puddles, steam rising from a vent. Slow forward dolly down the alley. Cinematic, moody, photoreal, anamorphic look. Ambient noise: steady rain, distant traffic, the buzz of a flickering neon sign, a faint train passing. No dialogue, no background music.

TweakChange the location and rebuild the ambient list from what would actually be heard there — birds and wind for a forest, surf and gulls for a beach.

Veo / Dialogue / talking / lip-sync

Voice-over narration over b-roll

Prompt
Aerial drone shot drifting over **misty pine mountains at dawn**, golden light breaking through fog. A calm, deep male voice narrates over the footage: "Some mornings, the world holds its breath." Slow continuous camera glide, cinematic, photoreal, 4K. Ambient noise: faint wind, distant birdsong. No background music.

TweakKeep the narration short. For an off-screen voice, describe it as narration "over the footage" rather than a character "saying" the line on screen.

Veo / Dialogue / talking / lip-sync

Spoken product line (UGC ad style)

Prompt
Vertical 9:16 selfie shot. A **woman in her 30s in a bright kitchen** holds a **matte-green insulated water bottle** up to the camera, arm extended, slightly shaky handheld. She grins and says in a casual, excited voice, "Okay, this thing kept my coffee hot for nine hours — nine!" Natural window light, photoreal, authentic phone-camera look. No background music, no on-screen text.

TweakSwap the bold product and line. Keep it conversational and under ~14 words so the lip-sync stays clean in 8 seconds.

Veo / Dialogue / talking / lip-sync

Crowd / multi-voice ambience

Prompt
Medium shot inside a **busy Italian trattoria at dinner**, warm pendant lights, a **chef in whites** plating pasta at the pass. Handheld camera drifts past tables. Photoreal, warm tungsten grade. Ambient noise: overlapping cheerful chatter in Italian, clinking cutlery, sizzling from the kitchen. No clear foreground dialogue, no background music.

TweakUse "overlapping chatter" rather than scripting individual lines when you want background crowd noise without a featured speaker.

Veo / Dialogue / talking / lip-sync

Reaction line with timed beat

Prompt
Close-up of a **teenage boy at a desk** staring at a laptop, face lit by the screen. He reads silently for a beat, eyes widening, then says in a stunned voice, "No way. No way it actually worked." Static camera, moody desk-lamp light, photoreal. Ambient noise: quiet room tone, a soft laptop fan. No background music.

TweakThe "reads silently for a beat" instruction buys a pause before the line — use it so the delivery does not start the instant the clip opens.

Veo / Dialogue / talking / lip-sync

Stand-up comedian tells a joke (self-generated dialogue)

Prompt
a man doing stand up comedy in a small venue tells a joke (include the joke in the dialogue)

TweakThis is the minimalist community prompt that first showed Veo writing AND delivering its own joke. To control the material, replace the parenthetical with your own line in quotes: says, "..."

Veo / Dialogue / talking / lip-sync

Two characters in scripted back-and-forth dialogue

Prompt
a video with dialogue of two muffins while baking in an oven, the first muffin says "I can't believe this Veo 3 thing can do dialogue now!", the second muffin says "AAAAH, a talking muffin!"

TweakThe proof-of-concept for two-speaker dialogue in one clip. Each speaker gets one short quoted line. Swap the muffins for your two characters and rewrite both lines — keep each under ~10 words so both fit the 8-second clip.

Veo / Dialogue / talking / lip-sync

Spoken line mid-action (says while doing something)

Prompt
A man is running through a beautiful summer park at dawn, he is out of breath, he slows and stops, looks at the camera and says, while panting, "Run AI with an API. Use Replicate", then he carries on running.

TweakNote the structure: action, then "looks at the camera and says, while panting, \"...\"", then more action. The delivery cue ("while panting") shapes the voice mid-motion. Swap the line and the activity.

Veo / Dialogue / talking / lip-sync

Singing / musical delivery

Prompt
Medium shot of a **street busker with an acoustic guitar** on a sunlit cobblestone corner, small crowd gathered. He strums and sings a warm, gentle folk melody, eyes closed. Handheld camera slowly circles him, warm afternoon light, photoreal. SFX: acoustic guitar strumming, faint city ambience. The singing is the only vocal.

TweakSinging is less reliable than spoken dialogue — describe the style and mood of the song rather than supplying exact lyrics, and expect to re-roll.

FAQ

How do I make a character actually speak in Veo?

Write the line in quotation marks after a speech verb and a comma — for example, A woman says, "We have to leave now." The comma and quotation marks are the cues that trigger lip-sync. Without them Veo may render the words as on-screen text or skip the dialogue entirely.

Why does Veo show my dialogue as subtitles on screen?

That usually happens when the line is not clearly framed as speech, or when no negative is set. Use the says, "..." pattern and add "no on-screen text, no subtitles" to the prompt.

How long can a spoken line be?

Veo clips are 4, 6, or 8 seconds, so keep a single spoken line to roughly 10–15 words. Longer lines get rushed or cut off. For a conversation, give each character one short line.

How do I control the voice or emotion?

Add a delivery cue before the quote: says in a weary voice, shouts excitedly, whispers nervously. Veo shapes the vocal performance from these descriptors.

Does native audio work on Veo 3.1 Lite and the free tier?

Veo 3.1 and 3.1 Lite generate native synchronized audio. Note that the Veo image-editing "add/remove object" path runs on Veo 2 and does not generate audio — verify your access tier before relying on dialogue.