AI

OpenCV founders launch AI video startup to take on OpenAI and Google

A new artificial intelligence startup founded by the creators of The world’s most widely used computer vision library has emerged from stealth with technology that generates realistic, human-centric videos of up to five minutes — a dramatic leap beyond the capabilities of rivals, including OpenAI’s Sora and Google’s Veo.

Craft storywhich launched Tuesday with $2 million in funding, introduces Model 2.0, a video generation system that addresses one of the key limitations facing the emerging AI video industry: expensive. While OpenAIs Sura 2 At a high of 25 seconds and most competing models generate clips of 10 seconds or less, the CraftStory system can produce continuous, coherent video performances that last as long as a typical YouTube tutorial or product demonstration.

The breakthrough could deliver substantial commercial value to companies struggling to scale video production for training, marketing and customer education – markets where short, AI-generated clips have proven inadequate despite their visual polish.

“If you actually try to make a video with one of these video generation systems, you’ll often find that you want to implement a certain creative vision, and no matter how detailed the instructions are, the systems actually ignore some of your instructions,” says Victor Erukhimov, founder and CEO of CraftStory, in an exclusive interview with VentureBeat. “We have developed a system that can generate videos for as long as you need them.”

How parallel processing solves the problem of long video frames

CraftStory’s advancement relies on what the company describes as a parallel diffusion architecture – a fundamentally different approach to the way AI models generate video compared to the sequential methods used by most competitors.

Traditional video generation models work by running diffusion algorithms on increasingly large three-dimensional volumes with time representing the third axis. To generate a longer video, these models require proportionately larger networks, more training data, and significantly more computing resources.

Craft story instead, multiple smaller diffusion algorithms are executed simultaneously over the entire duration of the video, with bidirectional constraints connecting them. “The last part of the video can also influence the first part of the video,” Erukhimov explained. “And this is quite important, because if you do it one by one, an artifact that appears in the first part propagates to the second, and then it piles up.”

See also  Agentic coding comes to Apple's Xcode with agents from Anthropic and OpenAI

Instead of generating eight seconds and then adding additional segments, the CraftStory system processes all five minutes simultaneously through interconnected diffusion processes.

Crucially, CraftStory trained its model on its own footage rather than relying solely on videos scraped from the internet. The company hired studios to capture actors using high frame rate camera systems that capture sharp detail even in fast-moving elements like fingers, avoiding the motion blur inherent in standard 30 frames per second YouTube clips.

“What we showed is that you don’t need a lot of data and a small training budget to create high-quality videos,” said Erukhimov. “You just need high-quality data.”

Model 2.0 currently works as a video-to-video system: users upload a still image to animate and a “driving video” featuring a person whose movements the AI ​​will replicate. CraftStory offers preset driving videos recorded with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.

The system generates low-resolution 30-second clips in about 15 minutes. An advanced lip sync system synchronizes mouth movements with scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.

Fighting a war chest battle with $2 million versus billions

CraftStory is almost fully funded Andreas Filevwho sold his project management software company Wrike to Citrix $2.25 billion in 2021 and running now Zencoderan AI coding company. The modest increase is in stark contrast to the billions flowing into competing efforts – OpenAI has done that raised more than $6 billion in the latest round of funding alone.

Erukhimov pushed back on the idea that large-scale capital is a prerequisite for success. “I don’t necessarily believe in the statement that computing is the path to success,” he said. “It certainly helps if you have computing power. But if you raise a billion dollars with a PowerPoint, ultimately no one is happy, neither the founders nor the investors.”

Filev defended the David-versus-Goliath approach. “When you invest in startups, you are fundamentally investing in people,” he said in an interview with VentureBeat. “To paraphrase Margaret Mead, never underestimate what a small group of thoughtful, dedicated engineers and scientists can build.”

See also  DeepMind’s Michelangelo Benchmark: Revealing the Limits of Long-Context LLMs

He argued that CraftStory benefits from a focused strategy. “The major laboratories are engaged in an arms race to build universal basic video models,” says Filev. “CraftStory rides that wave and delves deeply into a specific format: long, engaging, human-centered video.”

Why computer vision expertise is important in generative AI video

Erukhimov’s credibility comes from his deep roots in computer vision rather than the transformer architectures that have dominated recent AI developments. He was an early contributor to OpenCV — the Open Source Computer Vision Library that has become the de facto standard for computer vision applications, with more than 84,000 stars on GitHub.

When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez for the express purpose of preserving and advancing the library. The company significantly expanded OpenCV and focused on automotive safety systems before Intel acquired it in 2016.

Filev said this background is exactly what makes Erukhimov well-positioned for video generation. “What people sometimes miss is that generative AI video is not just about the generative part, but about understanding movement, facial dynamics, temporal coherence and how people actually move,” says Filev. ‘Victor has mastered exactly those problems throughout his career.’

Enterprise focus is on training videos and product demos

While much of the public excitement around AI video generation has focused on consumer creative tools, CraftStory is pursuing a decidedly entrepreneurial strategy.

“We definitely think more about B2B than consumer,” Erukhimov said. “We’re thinking about companies, especially software companies, being able to create cool training videos, product videos and launch videos.”

The logic is simple: corporate training, product tutorials and customer education videos often last several minutes and require consistent quality throughout. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature.

“If you need a longer video, you should come with us,” Erukhimov said. “We can create consistent videos of up to five minutes in high quality.”

Filev echoed this assessment. “A major gap in this market is the lack of models that can generate consistent videos over longer sequences – and that is extremely important for real-world use,” he said. “If you’re making a commercial for your company, a 10-second video, no matter how good it looks, isn’t enough. You need 30 seconds, you need two minutes, you need more.”

See also  Space Nation TV series set by Roland Emmerich, based on a video game

The company expects cost savings for customers. Filev suggested that “a small business owner could create content in minutes that previously would have cost $20,000 and taken two months to produce.”

CraftStory also appeals to creative agencies producing video content for enterprise clients, where the value proposition focuses on cost and speed: agencies can capture an actor on camera and turn that footage into a finished AI video, rather than managing expensive multi-day shoots.

The next major development on CraftStory’s roadmap is a text-to-video model that allows users to generate long-form content directly from scripts. The team is also developing support for moving camera scenarios, including the popular walk-and-talk format common in high-end advertising.

Where CraftStory fits into a fragmented competitive landscape

CraftStory enters a busy and rapidly evolving market. OpenAIs Sura 2although not yet publicly available, has caused quite a stir. Google’s Veo models move forward quickly. Track, PikaAnd Stability AI they all offer video generation tools with different capabilities.

Erukhimov acknowledged the competitive pressure, but emphasized that CraftStory serves a distinct niche focused on human-centric videos. He positioned rapid innovation and market conquest as the company’s primary strategy, rather than relying on technical locks.

Filev sees the market fragmenting into different tiers, with large tech companies serving as “API providers of powerful general-purpose generation models,” while specialized players like CraftStory focus on specific use cases. “When the big players build the engines, CraftStory builds the production studio and assembly line on top of that,” he said.

Model 2.0 is now available at app.craftstory.com/model-2.0, with the company offering early access to users and companies interested in testing the technology. Whether a lightly funded startup can capture meaningful market share against deep-pocketed incumbents remains uncertain, but Erukhimov is generally confident in the opportunities ahead.

“AI-generated video will soon become the primary way companies communicate their stories,” he said.

Source link

Back to top button