Google releases new AI video model Veo 3.1 in Flow and API: what it means for enterprises

October 16, 2025

3 5 minutes read

As expected after days of leaks and rumors online, Google has done just that unveiled Veo 3.1the latest model for AI video generation, offering a range of creative and technical upgrades aimed at improving story control, audio integration and realism in AI-generated video.

While the updates expand options for hobbyists and content creators using Google’s online AI creation app, CurrentThe release also signals a growing opportunity for enterprises, developers and creative teams looking for scalable, customizable video tools.

The quality is higher, the physics better, the price the same as before and the control and editing functions more robust and varied.

Mine first tests showed that it is a powerful and high-performance model that will immediately appeal to every generation. However, the look is more cinematic, polished and slightly more ‘artificial’ than stock than rivals like OpenAI’s new Sora 2, released late last month, which may or may not be what a given user is looking for (Sora excels at handheld and ‘candid’ videos).

Extensive control over story and audio

Veo 3.1 builds on its predecessor, Veo 3 (released May 2025) with improved support for dialogue, ambient sound and other audio effects.

Native audio generation is now available for several key features in Flow, including ‘Frames to Video’, ‘Ingredients to Video’ and ‘Extend’, which respectively allow users to: convert still images into video; use items, characters and objects from multiple images in one video; and generate longer clips than the first 8 seconds, up to over 30 seconds or even 1+ plus if you continue from the last frame of a previous clip.

Previously, you had to manually add audio after using these features.

This addition gives users greater control over tone, emotion and storytelling – capabilities that previously required post-production work.

In enterprise contexts, this level of control can reduce the need for separate audio pipelines, providing an integrated way to create training content, marketing videos or digital experiences with synchronized sound and visuals.

Google noticed a blog post that the updates reflect user feedback calling for deeper artistic control and improved audio support. Gallegos emphasizes the importance of making edits and refinements possible directly in Flow, without having to edit scenes from scratch.

Richer input and editing options

With Veo 3.1, Google introduces support for multiple input types and more granular control over the generated output. The model accepts text prompts, images and video clips as input and also supports:

Reference images (maximum three) to guide the look and style in the final output
First and last frame interpolation to generate seamless scenes between fixed endpoints
Scene extension which continues the action or motion of a video beyond the current duration

These tools aim to give business users a way to refine the look and feel of their content, which is useful for brand consistency or adhering to creative briefs.

Additional capabilities such as “Insert” (adding objects to scenes) and “Remove” (removing elements or characters) are also introduced, although not all of them are immediately available via the Gemini API.

Implementation on different platforms

Veo 3.1 can be accessed through several existing Google AI services:

CurrentGoogle’s own interface for AI-assisted filmmaking
Gemini APIaimed at developers who build video capabilities into applications
Vertex AIwhere enterprise integration will soon support Veo’s “Scene Extension” and other key features

Availability through these platforms allows enterprise customers to choose the right environment (GUI-based or programmatic) based on their teams and workflows.

Prices and admission

The Veo 3.1 model is currently in example and only available on the paid low of the Gemini API. The cost structure is the same as Veo 3, Google’s previous generation of AI video models.

Standard model: $0.40 per second of video
Fast model: $0.15 per second

There is no free tier and users only pay when a video is successfully generated. This model is consistent with previous Veo versions and offers predictable pricing for budget-conscious business teams.

Technical specifications and exit control

Veo 3.1 outputs video on 720p or 1080p resolutionimmediately Frame rate of 24 fps.

Duration options include 4, 6 or 8 seconds from a text prompt or uploaded images, with the ability to expand videos to 148 seconds (over 2 and a half minutes!) when you use the “Expand” function.

New functionality also includes tighter control over topics and environments. For example, companies can upload a product image or visual reference, and Veo 3.1 will generate scenes that preserve the look and stylistic cues in the video. This could streamline creative production pipelines for retail, advertising and virtual content teams.

First reactions

The broader community of makers and developers has reacted to the launch of Veo 3.1 with a mix of optimism and tempered criticism, especially when compared to competing models such as OpenAI’s Sora 2.

Matt Shumer, an AI founder of Otherside AI/Hyperwrite and early adopter described his initial reaction as “disappointment,” noting that Veo 3.1 is “noticeably worse than Sora 2” and also “a lot more expensive.”

However, he acknowledged that Google’s tooling, such as support for references and scene expansion, is a bright spot in the release.

Travis Davidsa digital 3D artist and AI content creator, echoed some of that sentiment. While he noted improvements in audio quality, especially in sound effects and dialogue, he expressed concerns about the limitations that remain in the system.

These include the lack of custom voice support, the inability to directly select generated voices, and the continued limit on 8-second generations, despite some public claims about longer output.

Davids also pointed out that character consistency across changing camera angles still requires careful cues, while other models like Sora 2 handle this more automatically. He questioned the lack of 1080p resolution for users on paid tiers like Flow Pro and expressed skepticism about feature parity.

On the more positive side, @kimmonimus, one AI newsletter writer stated that “Veo 3.1 is great,” but still concluded that OpenAI’s latest model remains the preferred choice overall.

Collectively, these early impressions suggest that while Veo 3.1 offers meaningful tooling improvements and new creative control features, expectations have shifted as competitors raise the bar on both quality and usability.

Adoption and scale

Since the launch of Flow five months ago, Google has been saying about 275 million videos have been generated for different Veo models.

The pace of adoption indicates significant interest not only from individuals, but also from developers and companies experimenting with automated content creation.

Thomas Iljic, Director of Product Management at Google Labs, emphasizes that the release of Veo 3.1 brings the capabilities closer to the way human filmmakers plan and shoot. These include scene composition, continuity between shots, and coordinated audio—all areas that companies are increasingly looking to automate or streamline.

Safety and responsible AI use

Videos generated with Veo 3.1 are watermarked using Google SynthID technology, which embeds an imperceptible identifier to indicate that the content is AI-generated.

Google applies security filters and moderation to all its APIs to minimize privacy and copyright risks. Generated content is stored temporarily and deleted after two days unless downloaded.

For developers and enterprises, these features provide certainty about provenance and compliance, which is critical in regulated or brand-sensitive industries.

Where Veo 3.1 stands in a busy AI video model space

Veo 3.1 isn’t just a rehash of previous models; it represents a deeper integration of multimodal inputs, storytelling control, and enterprise-level tools. While creative professionals can see immediate benefits in editing workflows and reliability, companies exploring automation in training, advertising, or virtual experiences can find even more value in the model’s composability and API support.

Early user feedback shows that while Veo 3.1 offers valuable tools, expectations around realism, voice control and generation time are rapidly evolving. As Google expands access through Vertex AI and continues to refine Veo, its competitive position in enterprise video generation will depend on how quickly these user pain points are addressed.

Source link