China-developed Text-to-video Large Model Launched for Global Users
TMTPOST--Vidu, a text-to-video large model capable of generating high-definition videos in 1080p resolution up to 16 seconds with a single click, has officially launched for global users.
Developed by Tsinghua University and Chinese AI firm ShengShu Technology, Vidu is China's first video large AI model with "extended duration, exceptional consistency and dynamic capabilities."
As a large model developed in China, Vidu is able to understand and generate Chinese content such as the panda and the loong, or the Chinese dragon, according to Zhu Jun, deputy director of the Tsinghua Institute for Artificial Intelligence.
While the product launched on Tuesday features high dynamic range, realism, and consistency, it also adds new features such as “Character to Video,” anime styles, and text and special effects generation.
Vidu boasts the industry's fastest inference speed, generating a 4-second clip in just 30 seconds, according to Shengshu Technology. Users can register with their email to start using Vidu without any other application.
Vidu offers a free version and three paid subscription plans. The free version includes 80 points per month, allowing the creation of 20 four-second videos. Monthly subscriptions are available in Standard, Premium, and Deluxe plans, priced at $19.99, $59.99, and $199.99, respectively, with corresponding points of 240, 800, and 2880, extending video lengths to eight seconds, and removing watermarks for commercial use. There is a 50% discount for the first two weeks post-launch.
Annual subscriptions are also available, with the Standard, Premium, and Deluxe plans priced at $7.99, $23.99, and $79.99 per month. Additionally, Vidu has opened API beta testing applications.
Vidu's informal launch in April came in the wake of OpenAI's video generation tool "Sora" introduced in February, which caught widespread attention. Several Chinese companies, including Kuaishou, Zhipu AI, Shengshu, and HiDream, have since announced their own multimodal or video generation models.
Vidu's visual quality approaches cinematic levels in composition, narrative, and lighting, and can produce film-grade special effects like smoke, lens flares, and CG effects.
Earlier, Tang Jiayu, the co-founder and CEO of Shengshu Technology, told TMTPost that while domestic AI video generation still lags behind Sora, there is a strong determination to catch up. Tang believes that surpassing Sora, currently akin to GPT-2 in terms of development stage, is feasible with sufficient engineering experience.
Tang emphasized the potential for achieving Sora-level performance by the end of the year, though exact timelines remain uncertain. He expressed confidence in their ability to meet and possibly exceed current Sora capabilities.