Veo Dialogue & Audio Prompts

These are tested Veo prompts for dialogue, lip-sync, sound effects, and ambient audio — the part most AI video models still get wrong. To make a character speak, write the line in quotation marks after a speech verb and a comma: A woman says, "We have to leave now." That comma-and-quotes pattern is what triggers Veo’s lip-sync; without it, the words often appear as on-screen text instead. Copy any prompt below, paste it into Veo, and change the bold details. This cluster is one part of the full Veo Prompt Library, covering product, UGC, and image-to-video prompts as well.

Why Veo’s audio is different

Veo generates picture and sound together in one pass, not as a soundtrack bolted on afterward. That joint process is why lip movements line up with the words and why ambient sounds match the scene. It also means you should prompt audio as its own layer of the scene — name the dialogue, the sound effects, and the ambience explicitly — rather than tacking “with sound” onto the end of a visual prompt and hoping.

How to write Veo audio prompts

Dialogue: [Character] says, "[line]" — keep the comma and the quotation marks. Add a voice cue before the quote (says in a calm, low voice) to direct the performance.
Sound effects: prefix with SFX: and name each sound precisely — SFX: the sharp clang of hammer on steel, not “blacksmith sounds”.
Ambient: prefix with Ambient noise: for the environmental bed — rain, room tone, distant traffic.
Protect the lip-sync: add no background music when dialogue matters; Veo can add music by default and it competes with speech.
Stop caption bleed: add no on-screen text, no subtitles if Veo keeps printing your words on screen.
Mind the clock: clips are 4, 6, or 8 seconds. Keep one short line per character so it is not rushed or truncated.

Ready to script a scene? Build it in the Veo Prompt Builder — the dialogue preset starts from these same says-quote and no-subtitles defaults. If the delivery is more casual than scripted, the UGC-style prompts show the handheld, selfie-vlog version of a spoken line, and the product video prompts show how to add a short spoken hook to an ad.

Got your prompt? Run it on a model with native audio

These prompts need a Veo-capable runner. If you do not have direct Veo access, Pollo AI lets you run Veo and other video models in one place, so you can test a dialogue prompt and re-roll without juggling accounts. Disclosure: this is an affiliate link — we may earn a commission if you subscribe, at no extra cost to you. We only suggest tools we would use to run these prompts.

Pillar: Veo Prompt Library — all use cases in one place.
Veo Image-to-Video Prompts — add motion and audio to a still image.
Veo UGC-Style Prompts — selfie-vlog and testimonial formats that lean on spoken delivery.
Veo Product Video Prompts — hero shots and ads.

Prompt deck

Copy a format, check the evidence, then customize it.

20 prompts 10 evidenced 10 community 0 owner-tested

Veo / Dialogue / talking / lip-sync

Historical figure explains a concept to camera (Pythagoras)

Why it works Naming a well-known figure and a concept gives Veo enough context to generate period-appropriate speech, setting, and delivery without you writing a script — proof native audio can carry an entire explanatory monologue from a one-line brief.

Prompt

Pythagoras explaining his theorem, in ancient Greece

Veo / Dialogue / talking / lip-sync

Two-person conversation (lip-synced dialogue)

Prompt

Medium two-shot in a dim 1940s detective office, rain streaking the window. A **weary middle-aged detective in a rumpled grey suit** sits behind the desk; a **young woman in a red coat** stands in the doorway. The detective looks up and says in a tired, gravelly voice, "Of all the offices in this town, you had to walk into mine." Static camera, soft low-key lamp light, shallow depth of field, film-noir grade. Ambient noise: faint rain on glass, a ticking clock. No background music.

Veo / Dialogue / talking / lip-sync

Single character talking to camera (clean lip-sync)

Prompt

Close-up of a **friendly barista in her late 20s with curly hair**, standing behind a coffee counter, looking directly into the lens. She smiles and says in a warm, upbeat voice, "Pull up a stool — this one's on the house." Static handheld feel, soft window daylight, photoreal, 4K. No background music, no on-screen text, no subtitles.

Veo / Dialogue / talking / lip-sync

Scripted rap-battle dialogue between two characters (dual accents, lip-sync)

Why it works Naming a distinct accent and subject per speaker ("British accent about gravity", "German accent about relativity") gives Veo two clearly separable voice targets, which is why the lip-sync and the back-and-forth timing hold up even with a long, lyrical script.

Prompt

A high-energy rap battle between Isaac Newton and Albert Einstein on a futuristic sci-fi stage. The camera alternates between close-ups and dramatic wide shots as they diss each other with sharp lyrics. Newton, in a classic 17th-century outfit, raps with a British accent about gravity and apples. Einstein, with wild hair and a German accent, fires back about relativity and space-time. Their lip-sync is perfectly timed to the beat, and their facial expressions are intense and animated. The background pulses with neon lights and holographic equations, reacting to the rhythm. The crowd of AI-generated scientists cheers them on in sync with the music. It feels like a rap battle from another dimension.

Veo / Dialogue / talking / lip-sync

Musical performance with full vocal delivery (opera singer)

Why it works A minimal setup ("an opera singer singing on stage") gives Veo room to generate a full musical performance with confident timing, showing that native audio extends to sustained singing, not only short spoken lines — useful evidence for the singing/musical prompt above, which is otherwise untested.

Prompt

an opera singer singing on stage.

Veo / Dialogue / talking / lip-sync

Narrated voice-over with in-scene action (streamer commentary)

Why it works Framing the scene as an achievement moment ("getting a victory royale") cues Veo to generate reactive, in-character commentary rather than a flat description — a reusable pattern for any UGC-style voice-over-over-action clip.

Prompt

Streamer getting a victory royale with just his pickaxe

Veo / Dialogue / talking / lip-sync

Observational narration over a scene (mockumentary voice)

Why it works Describing two roles and a camera pan ("pans over to...taking notes") in one sentence gives Veo both a speaker and an audience reaction to render, which is why this format reliably produces layered classroom dialogue instead of one flat voice.

Prompt

A college professor doing a class on Gen Z slang and the video pans over to all the boomers taking notes and seeming super interested

Veo / Dialogue / talking / lip-sync

Emotional delivery (voice direction before the line)

Prompt

Medium close-up of an **astronaut in a worn flight suit** inside a cramped capsule, soft instrument glow on her face. She stares out the porthole and says, in a hushed, awestruck whisper, "I never thought I'd actually see it." Slow push-in, cinematic low-key lighting, photoreal. Ambient noise: the quiet hum of the capsule. No background music.

Veo / Dialogue / talking / lip-sync

Mouth-sound SFX driving the visuals (voice-artist sound design)

Why it works Specifying who or what makes each sound — rather than just naming the sound — gives Veo a consistent audio character to render across many quick cuts, which is why the sound-to-visual sync holds together over a fast-paced, multi-scene sequence instead of drifting.

Prompt

A dynamic camera glides through a miniature LEGO world, where an epic adventure unfolds. All sound effects—footsteps, explosions, cars, dragons—are created using mouth sounds by a single AI-generated voice artist. As each sound is made, the visuals instantly respond: LEGO characters jump into action, cars race, spaceships take off, volcanoes erupt. The journey moves through LEGO-built environments—city streets, underwater ruins, space stations, and lava lairs. The video is fast-paced, playful, and visually rich, like a blend between The LEGO Movie and next-gen AI storytelling. The sound-to-visual sync creates a magical, toy-driven universe where imagination controls reality.

Veo / Dialogue / talking / lip-sync

Sung performance with crowd reaction (character + setting + song topic)

Why it works Giving the song a specific, absurd topic ("how things used to be in the Mesozoic Era") rather than "singing a song" gives Veo actual content to perform, and scripting the crowd's reaction closes the loop so the clip reads as a complete bit rather than a performance that just stops.

Prompt

A dinosaur with a white fedora and a Hawaiian shirt playing an acoustic guitar on stage at a small waterside bar in Puerto Rico. The dinosaur is singing about how thing used to be in the Mesozoic Era. People are clapping and laughing.

Veo / Dialogue / talking / lip-sync

Sound-effects-led scene (no dialogue)

Prompt

A **blacksmith in a leather apron** hammers a glowing orange blade on an anvil in a dark forge. Sparks fly with each strike. Slow tracking shot around the anvil, warm firelight, photoreal, slow motion on the sparks. SFX: the sharp clang of hammer on steel, the hiss of hot metal, crackling embers. Ambient noise: low roar of the forge fire. No background music, no dialogue.

Veo / Dialogue / talking / lip-sync

Ambient soundscape (mood without speech)

Prompt

Wide establishing shot of a **rain-soaked Tokyo alley at night**, neon signs reflected in puddles, steam rising from a vent. Slow forward dolly down the alley. Cinematic, moody, photoreal, anamorphic look. Ambient noise: steady rain, distant traffic, the buzz of a flickering neon sign, a faint train passing. No dialogue, no background music.

Veo / Dialogue / talking / lip-sync

Voice-over narration over b-roll

Prompt

Aerial drone shot drifting over **misty pine mountains at dawn**, golden light breaking through fog. A calm, deep male voice narrates over the footage: "Some mornings, the world holds its breath." Slow continuous camera glide, cinematic, photoreal, 4K. Ambient noise: faint wind, distant birdsong. No background music.

Veo / Dialogue / talking / lip-sync

Spoken product line (UGC ad style)

Prompt

Vertical 9:16 selfie shot. A **woman in her 30s in a bright kitchen** holds a **matte-green insulated water bottle** up to the camera, arm extended, slightly shaky handheld. She grins and says in a casual, excited voice, "Okay, this thing kept my coffee hot for nine hours — nine!" Natural window light, photoreal, authentic phone-camera look. No background music, no on-screen text.

Veo / Dialogue / talking / lip-sync

Crowd / multi-voice ambience

Prompt

Medium shot inside a **busy Italian trattoria at dinner**, warm pendant lights, a **chef in whites** plating pasta at the pass. Handheld camera drifts past tables. Photoreal, warm tungsten grade. Ambient noise: overlapping cheerful chatter in Italian, clinking cutlery, sizzling from the kitchen. No clear foreground dialogue, no background music.

Veo / Dialogue / talking / lip-sync

Reaction line with timed beat

Prompt

Close-up of a **teenage boy at a desk** staring at a laptop, face lit by the screen. He reads silently for a beat, eyes widening, then says in a stunned voice, "No way. No way it actually worked." Static camera, moody desk-lamp light, photoreal. Ambient noise: quiet room tone, a soft laptop fan. No background music.

Veo / Dialogue / talking / lip-sync

Stand-up comedian tells a joke (self-generated dialogue)

Prompt

a man doing stand up comedy in a small venue tells a joke (include the joke in the dialogue)

Veo / Dialogue / talking / lip-sync

Two characters in scripted back-and-forth dialogue

Prompt

a video with dialogue of two muffins while baking in an oven, the first muffin says "I can't believe this Veo 3 thing can do dialogue now!", the second muffin says "AAAAH, a talking muffin!"

Veo / Dialogue / talking / lip-sync

Spoken line mid-action (says while doing something)

Prompt

A man is running through a beautiful summer park at dawn, he is out of breath, he slows and stops, looks at the camera and says, while panting, "Run AI with an API. Use Replicate", then he carries on running.

Veo / Dialogue / talking / lip-sync

Singing / musical delivery

Prompt

Medium shot of a **street busker with an acoustic guitar** on a sunlit cobblestone corner, small crowd gathered. He strums and sings a warm, gentle folk melody, eyes closed. Handheld camera slowly circles him, warm afternoon light, photoreal. SFX: acoustic guitar strumming, faint city ambience. The singing is the only vocal.

FAQ

How do I make a character actually speak in Veo?

Write the line in quotation marks after a speech verb and a comma — for example, A woman says, "We have to leave now." The comma and quotation marks are the cues that trigger lip-sync. Without them Veo may render the words as on-screen text or skip the dialogue entirely.

Why does Veo show my dialogue as subtitles on screen?

That usually happens when the line is not clearly framed as speech, or when no negative is set. Use the says, "..." pattern and add "no on-screen text, no subtitles" to the prompt.

How long can a spoken line be?

Veo clips are 4, 6, or 8 seconds, so keep a single spoken line to roughly 10–15 words. Longer lines get rushed or cut off. For a conversation, give each character one short line.

How do I control the voice or emotion?

Add a delivery cue before the quote: says in a weary voice, shouts excitedly, whispers nervously. Veo shapes the vocal performance from these descriptors.

Does native audio work on Veo 3.1 Lite and the free tier?

Veo 3.1 and 3.1 Lite generate native synchronized audio. Note that the Veo image-editing "add/remove object" path runs on Veo 2 and does not generate audio — verify your access tier before relying on dialogue.