A stunningly lifelike video of a standup comedian delivering a joke, lips synced perfectly to the voice, is just one of several clips going viral after being generated by Google’s latest AI tool.
From that uncanny standup act to a musical number featuring actors singing an ode to garlic bread, the newly released examples of Veo 3’s capabilities are fuelling both excitement and unease across the internet.
“I got chills down the base of my spine when I first saw this, because the usual tells that AI generated this video were nowhere on display and impossible to see,” says technology analyst Carmi Levy.
Say goodbye to the silent era of video generation: Introducing Veo 3 — with native audio generation. 🗣️
— Google (@Google) May 20, 2025
Quality is up from Veo 2, and now you can add dialogue between characters, sound effects and background noise.
Veo 3 is available now in the @GeminiApp for Google AI Ultra… pic.twitter.com/7rcXeBslyU
Unlike earlier text-to-video tools, which often produced pixelated, jerky content, Veo 3 boasts accurate lip syncing, high-quality visuals, and the ability to generate synchronized audio, including speech, singing, sound effects and ambient noise.
“It makes the process much simpler. It means the final output is much more realistic both visually and from a sound perspective,” says Levy. “It means that anyone, even if they don’t have a lot of skills, can create stunningly realistic videos almost from scratch.”
Currently, Veo 3 is available only in the United States, and only to subscribers of Google’s US$249 ultra subscription tier. According to early users, its results are impressive, especially considering the relatively short history of public text-to-video generation, which only emerged in 2022.
“It’s stunningly realistic, and you literally don’t know that it isn’t real. That is frightening on a whole number of levels and in the wrong hands this technology can wreak a lot of havoc,” says Levy.
One montage features AI-generated people repeating phrases like “we can talk” and “what should we talk about?” triggering concerns over how these tools could be used to mislead viewers.
“It’s a significant leap forward, not just in the addition of sound but in how faithfully it follows the prompt it’s given,” says Mark Daley, Chief AI Officer at Western University.
Google has not disclosed what data was used to train Veo 3, but experts suspect it likely includes content from YouTube, a Google-owned platform.
“Neural nets require lots of data to train. So if you want a model that’s really good at creating images and sound, you need a huge data set of images and sound, and YouTube is probably the biggest one on the planet,” says Daley.
Aengus Bridgman is director of the Media Ecosystem Observatory which is focused on safeguarding Canada’s digital information environment. He warns that the stakes are high, especially as the technology becomes increasingly difficult to distinguish from real footage.
“There’s a lot of fun that can be had, but actually using it to mislead and disinform people is always the real concern,” says Bridgman. “It’s almost going to be impossible for the lay person to really distinguish.”
He advises people to be skeptical of unfamiliar sources.
“Follow content creators or personalities or people that you trust. Who you know are going to do some sort of verification on this,” he says.
“You can definitely do sort of a forensic analysis. There tends to be patterns in the data that look a little bit inconsistent. But the problem is, you often need the raw file, and you need some computation to go through and really investigate that.”
While Google is positioning Veo 3 as a breakthrough for creators, some fear it could disrupt the film and television industry and lead to job losses.
“It’s only a matter of time before an entirely AI generated film is released,” says Levy.