Last week Google unveiled a new technology called VideoPoet, showing the company’s constant work on AI development. With this AI video generator, any autoregressive language model or large language model (LLM) can be transformed into a high-quality video generator. VideoPoet exhibits cutting-edge video generating, especially when it comes to creating an extensive array of high-quality motions.
Animating still images, altering movies for inpainting or outpainting, and creating audio from video are just a few of the many tasks that VideoPoet's multifunctional model can perform. It accepts inputs in the form of text, images, or videos and can convert text to video, picture to video, and video to audio. Its versatility, which simplifies a variety of video production chores by combining several elements into a single model, is a major benefit. Unlike other systems, VideoPoet uses discrete tokens, similar to language models, using tokenizers such as SoundStream for audio and MAGVIT V2 for images and MAGVIT V2 for images and video.
VideoPoet's superior comprehension of context and content is demonstrated by its capacity to create films with a variety of motions and styles based on particular text inputs. The model shows an amazing capacity to preserve object integrity and appearance over extended periods, whether animating a painting or creating a video clip from a descriptive text. Google AI reports that the model can produce films in either portrait or square orientation, depending on the needs of short-form material. It can also produce audio from a video feed.
One noteworthy aspect of VideoPoet is its ability to modify videos interactively. A great deal of creative power is provided by the ability of users to direct the model to change motions or activities within a film. Moreover, the model can precisely comply with camera motion orders, which increases its usefulness in producing dynamic and aesthetically appealing footage. Furthermore, VideoPoet's outstanding multimodal awareness is demonstrated by its ability to provide believable audio for created video without any user input.
VideoPoet produces 2-second videos by default. On the other hand, it can forecast a one-second video output given a one-second video clip. A video of any length can be created by repeating this technique endlessly.
Even if VideoPoet's output still falls well behind Runway AI and Pika's video generation tools, it shows how far Google has come in AI-based video creation and editing.
To demonstrate the capabilities of VideoPoet, Google’s team has created a brief film that is made up of numerous short clips that the model has created. They also asked Bard AI chatbot to compose a set of suggestions for the script that would detail a brief story about a raccoon that travels. Here, you can see the AI-generated video that VideoPoet was able to produce.